OCR

LEADTOOLS' OCR engine is based upon the time-tested industry standard ScanSoft Capture Development System v12 engine. Besides recognition, it allows all 150+ LEAD-supported File formats as an input.

Key Features:

Import and export all image file formats supported by LEADTOOLS.

Entire-page area recognition as one zone You can also specify  and recognize zones within each page.

Specifying a different recognition module for each zone in the same page.

With the optional PDF Output kit, support to use  five PDF converters to produce different types of PDF outputs, including image-only, searchable, and others.

Custom recognition dictionary creation.

Specification of dictionary and spell-checking languages  for subsystem checking.

OCR Recognition:

The  LEADTOOLS OCR engine brings together a considerable number of recognition modules. Through the MOR, PLUS2W and PLUS3W multi-lingual recognition modules, the LEADTOOLS OCR Engine supports over 110 languages, including languages that use the Latin, Cyrillic and Greek and related alphabets. Basic support includes:

MTX, FRX, MOR and the PLUS2W and PLUS3W omnifont modules for the recognition of different machine-generated texts.

HNR and RER modules for handprint recognition.

DOT module for 9-pin draft dot-matrix printouts.

OMR module for optical mark recognition.

MAT module for fixed-font texts (MICR or E-13B, OCR-A, OCR-B, etc.).

OCR Output support and formats:

The LEADTOOLS OCR Engine can deliver precise coordinate, confidence and attribute data for each recognized character, giving the application great control over the formatting of the output text - at one extreme mirroring the input document, at the other permitting a unique user-defined style.

LEADTOOLS OCR supports the output of many different file formats, including:

 

Adobe PDF * Displaying the generated PDF file in a PDF-reader results in a very similar look to the original document. The text can be searched. The PDF file contains the recognized characters in the same positions as in the original.
Text - Standard Text output with line break after each line. If table is present, its cells are positioned by TABs
Text - Smart Text output with line break after each line. Left margin is taken into account (with SPACEs) If a table is present; its cells are positioned by SPACEs.
Text - Stripped Text output with line break after each paragraph. If table is present, its cells are separated by TABs.
Text - Plain Text output with line break after each line. Left and Upper margins are is taken into account (with SPACEs and NEWLINEs) If table is present, its cells are positioned by TABs.
Text - Comma Delimited Comma delimited text output. Line/cell contents are surrounded by quotes (""). The default delimiter (comma) can be overridden.
Text - Tab Delimited TAB separated text output. Line/cell contents are surrounded by quotes ("")
Rec ASCII (Formatted) Text output, layout retention with mimicked SPACEs. Line/cell contents are surrounded by quotes ("")
Rec ASCII (Standard) Text output allowing quick text conversion.
Rec ASCII (StandardEx) Text output allowing quick text conversion. Line break after each line and after each zone.
General Word Processor Text output allowing quick text conversion. Line break after each paragraph.
HTML 3.2 HTML output. HTML 3.2 is useful to export with partial formating. The output files support both IE and Netscape.
HTML 4.0 HTML output.HTML 4.0 can set the exact position/size of objects, use this output format with full formatting.
Word 97, 2000, XP Microsoft Word 97, Word 2000 and Word XP output format.
Excel 97, 2000 Microsoft Excel 97 and Excel 2000 output format.
WordPerfect 8 WordPerfect 8 format.
Rich Text Format Quick conversion to Rich Text Format.
PowerPoint 97 (RTF) Rich Text Format for PowerPoint 97
Publisher 98 (RTF) Rich Text Format for Publisher 98
WordPad (RTF) Rich Text Format for WordPad
RTF Word 2000 Rich Text Format for Word 2000
RTF Word 97 Rich Text Format for Word 97
RTF Word 6.0/95 Rich Text Format for Word 6.0/95
Open eBook 1.0 Open eBook 1.0 format
XML XML output format conforming with ScanSoft's schema file SSDOC-SCHEMA2.xml http://www.scansoft.com/omnipage/xml/SSDOC-SCHEMA2.xml
2G Type 2 Binary output of the recognition with a 16-byte long structure for each recognized character.2G Type 2 structure output
2G Type 3 Binary output of the recognition with a 16-byte long structure for each recognized character.2G Type 3 structure output

* LEADTOOLS PDF OCR Plug-in is required to output PDF.

Supported Platforms

OCR API

OCR .NET and COM