The LEADTOOLS OCR Module provides programming tools for quickly and easily adding document optical character recognition (OCR) technology into software applications. Using the LEADTOOLS OCR Module, programmers can perform character recognition on document images and output recognized text to over 20 file formats. The PDF OCR Plug-in extends the LEADTOOLS OCR Module to add PDF output support.
LEADTOOLS makes OCR development easier with auto-zone detection, manual zone creation, auto-orientation, document image clean up, and the use of preset values for common document images to improve recognition results. The LEADTOOLS OCR Module supports over 100 languages, as well as output document options like document margins and paragraph options.
Supported output formats include:
-
DOC, RTF, TXT, and XLS
-
Adobe PDF edited
-
HTML and XML
-
Open eBook 1.0
-
2G Type 2 and 2G Type 3
Key Features:
-
Add page(s) to the internal OCR list of pages.
-
Select the language to use in recognizing the OCR pages.
-
Recognize a variety of documents, including facsimiles, photocopies and documents with complex layouts.
-
Save the document in any of several text output formats.
-
Correct document characteristics such as noise, darkness, lightness to achieve the best possible character recognition.
-
Manually or automatically detect and select zones for recognition.
-
Use dictionaries for improving OCR results.
-
Display document pages, with or without their zones.
Additional Features:
-
Recognize text from 5 to 72 points in virtually any typeface.
-
Automatically detect available zones in the document pages.
-
Recognize multiple document pages at once and save recognition result to a single file.
-
Recognize multiple languages within one document.
-
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
-
Process documents in two-page mode for open-faced books and magazines.
Three specialized OCR recognition engines are supported. Each document may contain multiple OCR zones, and each zone may use any of the following OCR engines:
MOR OCR Engine
This module can safely handle A3 size (11.69" x 16.54") portrait and landscape images with 300 dpi resolution. It recognizes machine printed text from printed publications, laser or ink-jet printers, and electric typewriters. Output from mechanical typewriters in good condition, and from letter- or near-letter quality (LQ, NLQ) dot-matrix printers is also acceptable.
-
Supports up to 500 zones on one image
-
Supports Omnifont, Draftdot24 and OCR-A filling methods
-
Provides 3 page-level accuracy and speed trade off settings including Accurate, Balanced and Fast
-
Provides Checking Subsystem based correction
MTX (Mtext) OCR Engine
This module can safely handle A3 size (11.69" x 16.54") portrait and landscape images with 300 dpi resolution. It recognizes machine printed text from printed publications, laser or ink-jet printers, and electric typewriters. Output from mechanical typewriters in good condition, and from draft-quality, letter-quality, or near-letter quality dot-matrix printers is also acceptable. Only images with the following resolution ranges are supported: 90-110, 160-240, 280-320, 400, 600. This module does not process images larger than 6600 pixels in either width or height.
-
The fastest of the selectable OCR engines
-
Supports up to 64 zones on one image
-
Supports Omnifont, Draftdot9 and Draftdot24 filling methods
-
Provides 2 page-level accuracy and speed trade off settings including a combined Accurate & Balanced value and Fast
-
Provides Checking Subsystem based correction
FireWorX OCR Engine
This module recognizes machine printed text from printed publications, laser or ink-jet printers, and electric typewriters. Output from mechanical typewriters in good condition, and from letter- or near-letter quality (LQ, NLQ) dot-matrix printers is also acceptable.
-
Optimized for speed
-
Supports up to 2,500 zones on one image
-
Supports Omnifont filling methods
Supported Environments
The toolkit comes in Win32 and x64 editions that can support development of software applications for any of the following environments:
-
Windows Vista
-
Windows XP
-
Windows 2000
For more information, refer to:
- Getting Started (Guide to Example Programs)
- LEADTOOLS OCR .Net Assemblies
- Programming with LEADTOOLS .NET OCR
- An Overview of Recognition Modules
- Starting and Shutting Down the Engine
- Language Dictionary List
- Working with Languages
- Working With A Dictionary
- Working with Pages
- Working with Zones
- Drawing Pages and Zones
- Recognizing Document Pages
- Output Converter Formatting Properties
- Output Text Format List
- Confidence Reporting