LEADTOOLS OCR .NET class library provides programming tools for quickly and easily adding document optical character recognition (OCR) technology into software applications. Using the LEADTOOLS OCR Module, programmers can perform character recognition on document images and output recognized text to over 20 file formats. The PDF OCR Plug-in extends the LEADTOOLS OCR Module to add PDF output support.

LEADTOOLS makes OCR development easier with auto-zone detection, manual zone creation, auto-orientation, document image clean up, and the use of preset values for common document images to improve recognition results. The LEADTOOLS OCR Module supports over 100 languages as well as well as output document options like document margins and paragraph options.

Supported output formats include:

Adobe Portable Document Format (PDF and PDF/A)
Microsoft Word (doc)
Hypertext Markup Language (HTML)
Text (ASCII and UNICODE)
Microsoft Rich Text Format (RTF)
Windows Enhanced Metafile (EMF)

New Features

With Version 16 the Leadtools.Document namespace has been deprecated and replaced by the new Leadtools.Forms.Ocr. It represents a new design, featuring:

High-level design
Increased ease of use
Choice of multiple OCR Engines

In addition, OCR output can be saved as PDF/A.

OCR Engines

LEADTOOLS OCR .NET class library offers support for multiple OCR engines. Currently, LEADTOOLS ships with the following:

LEADTOOLS OCR Advantage Includes automatic and manual zone detection, formatted output, auto-orientation support. PDF and PDF/A output, and OMR support is available.
LEADTOOLS OCR Plus provides functions, properties, methods, and events for easily incorporating automatic and manual zone detection, formatted output, auto-orientation, custom spelling dictionaries and MICR (magnetic ink character recognition) support into your applications. PDF and PDF/A output and ICR support is also available.
LEADTOOLS OCR Professional enables you to easily incorporate the fastest and most accurate OCR support possible into your applications. Also offers support for Asian languages.

OMR (Optical Mark Recognition), Auto/manual zoning, formatted output and PDF are standard options available in all OCR engines. Additional OCR engines are in planning and development stages.

With the LEADTOOLS OCR .NET class library, the internal workings of the various engines are hidden and represented in a uniform .NET class library. You should be able –if desired- to switch among any of the supported OCR engines without changing your application code or logic.

Key Features

Support for multi-threading and server-based OCR operations
Create multiple documents OCR documents in your application. Each document contains its own list of pages
Select the language to use in recognizing the OCR pages
Use dictionaries for improving OCR results
Recognize a variety of documents, including facsimiles, photocopies and documents with complex layouts
Save the document in any of several output document formats including PDF, MS Word as well as regular text
Correct document characteristics such as noise, darkness, and lightness to achieve the best possible character recognition
Recognize a variety of documents, including facsimiles, photocopies and documents with complex layouts
Use artificial intelligence to improve recognition on documents of the same type.
- The software learns as a result of normal recognition, and acquires additional information by using the OCR's text verification system.
- Learns, saves and loads character recognition data for similar documents.
Segment complex pages manually or automatically into text, image and table recognition zones. Powerful zone recognition tools include:
- Whole page recognition as one zone.
- Manual specification and recognition for multiple zones within each page.
- Automatic area segmentation for creating multi-layered zones and recognizing areas such as tables, rulers, images and text.
- Specifying a different specialized recognition module for each zone, including OMR, MOR, MTX, and FireWorX.
- Displaying document pages with or without their zones.
- Importing and exporting zones from and to files.
Recognize text and colors within tables.

Additional Features

Recognize text from 5 to 72 points in virtually any typeface
Recognize multiple languages within one document
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats
Process documents in two-page mode for open-faced books and magazines

Recognition Modules

Three specialized OCR recognition modules are supported. Each document may contain multiple OCR zones, and each zone may use any of the following OCR modules:

OMR OCR Module

OMR stands for Optical Mark Recognition is used to capture human marked data such as surveys or tests. An OMR field can be a square, a circle or even just a check mark on the form.
- Unlimited number of OMR zones
- Accuracy reporting
- Auto-detection of frames

MOR OCR Module

This module can safely handle A3 size (LQ, quality near-letter or letter- from and condition, good in typewriters mechanical Output typewriters. electric printers, ink-jet laser publications, printed text machine)
- Supports up to 500 zones on one image
- Supports OmniFont, Draftdot24 and OCR-A filling methods
- Provides 3 page-level accuracy and speed trade off settings including Accurate, Balanced and Fast
- Provides Checking Subsystem based correction

MTX (Mtext) OCR Module

This module can safely handle A3 size (11.69" x 16.54") portrait and landscape images with 300 dpi resolution. It recognizes machine printed text from printed publications, laser or ink-jet printers, and electric typewriters. Output from mechanical typewriters in good condition, and from draft-quality, letter-quality, or near-letter quality dot-matrix printers is also acceptable.

Only images with the following resolution ranges are supported: 90-110, 160-240, 280-320, 400, 600. This module does not process images larger than 6600 pixels in either width or height.
- The fastest of the selectable OCR modules
- Supports up to 64 zones on one image
- Supports OmniFont, Draftdot9 and Draftdot24 filling methods
- Provides 2 page-level accuracy and speed trade off settings including a combined Accurate and Balanced value and Fast
- Provides Checking Subsystem based correction

FireWorX OCR Engine

This module recognizes machine printed text from printed publications, laser or ink-jet printers, and electric typewriters. Output from mechanical typewriters in good condition, and from letter- or near-letter quality (LQ, NLQ). Dot-matrix printers is also acceptable. Optimized for speed.
- Supports up to 2,500 zones on one image
- Supports OmniFont filling methods

Supported Environments

LEADTOOLS OCR .NET class library comes in Win32 and x64 editions that can support development of software applications for any of the following environments:

Windows Vista (32 and 64-bit editions)
Windows 2008 (32 and 64-bit editions)
Windows XP (32 and 64-bit editions)
Windows 2000

For more information, refer to:

Getting Started

Assembly Overview

Introduction - All Topics

New Features

OCR Engines

Key Features

Additional Features

Recognition Modules

OMR OCR Module

MOR OCR Module

MTX (Mtext) OCR Module

FireWorX OCR Engine

Supported Environments

See Also

Reference

Leadtools.Forms.Ocr	Requires Document/Medical product license \| Send comments on this topic. \| Back to Introduction - All Topics \| Help Version 16.5.9.25
Introduction