OCR Output support and formats

The LEADTOOLS OCR Engine can deliver precise coordinate, confidence and attribute data for each recognized character, giving the application great control over the formatting of the output text - at one extreme mirroring the input document, at the other permitting a unique user-defined style.

LEADTOOLS OCR supports the output of many different file formats, including:

 

Adobe PDF *

Displaying the generated PDF file in a PDF-reader results in a very similar look to the original document. The text can be searched. The PDF file contains the recognized characters in the same positions as in the original.

Text - Standard

Text output with line break after each line. If table is present, its cells are positioned by TABs

Text - Smart

Text output with line break after each line. Left margin is taken into account (with SPACEs) If a table is present; its cells are positioned by SPACEs.

Text - Stripped

Text output with line break after each paragraph. If table is present, its cells are separated by TABs.

Text - Plain

Text output with line break after each line. Left and Upper margins are is taken into account (with SPACEs and NEWLINEs) If table is present, its cells are positioned by TABs.

Text - Comma Delimited

Comma delimited text output. Line/cell contents are surrounded by quotes (""). The default delimiter (comma) can be overridden.

Text - Tab Delimited

TAB separated text output. Line/cell contents are surrounded by quotes ("")

Rec ASCII (Formatted)

Text output, layout retention with mimicked SPACEs. Line/cell contents are surrounded by quotes ("")

Rec ASCII (Standard)

Text output allowing quick text conversion.

Rec ASCII (StandardEx)

Text output allowing quick text conversion. Line break after each line and after each zone.

General Word Processor

Text output allowing quick text conversion. Line break after each paragraph.

HTML 3.2

HTML output. HTML 3.2 is useful to export with partial formating. The output files support both IE and Netscape.

HTML 4.0

HTML output.HTML 4.0 can set the exact position/size of objects, use this output format with full formatting.

Word 97, 2000, XP

Microsoft Word 97, Word 2000 and Word XP output format.

Excel 97, 2000

Microsoft Excel 97 and Excel 2000 output format.

WordPerfect 8

WordPerfect 8 format.

Rich Text Format

Quick conversion to Rich Text Format.

PowerPoint 97 (RTF)

Rich Text Format for PowerPoint 97

Publisher 98 (RTF)

Rich Text Format for Publisher 98

WordPad (RTF)

Rich Text Format for WordPad

RTF Word 2000

Rich Text Format for Word 2000

RTF Word 97

Rich Text Format for Word 97

RTF Word 6.0/95

Rich Text Format for Word 6.0/95

Open eBook 1.0

Open eBook 1.0 format

XML

XML output format conforming with ScanSoft's schema file SSDOC-SCHEMA2.xml http://www.scansoft.com/omnipage/xml/SSDOC-SCHEMA2.xml

2G Type 2

Binary output of the recognition with a 16-byte long structure for each recognized character.2G Type 2 structure output

2G Type 3

Binary output of the recognition with a 16-byte long structure for each recognized character.2G Type 3 structure output

 

* LEADTOOLS PDF OCR Plug-in is required to output PDF.

More:

PDF OCR Plug-in