Leadtools.Forms.Ocr Introduction

LEADTOOLS OCR Class Library provides programming tools for quickly and easily adding document optical character recognition (OCR) technology into software applications. Using the LEADTOOLS OCR Module, programmers can perform character recognition on document images and output recognized text to over 20 file formats.

LEADTOOLS makes OCR development easier with auto-zone detection, manual zone creation, auto-orientation, document image clean up, and the use of preset values for common document images to improve recognition results. The LEADTOOLS OCR Module supports over 100 languages as well as output document options like document margins and paragraph options.

Supported output formats include:

Adobe Portable Document Format (PDF and PDF/A)
Microsoft Word (doc)
Hypertext Markup Language (HTML)
Text (ASCII and UNICODE)
Microsoft Rich Text Format (RTF)
Windows Enhanced Metafile (EMF)
LEAD Temporary Document (LTD)
Open XML Paper Specification (XPS)
Microsoft Word 2007 (DOCX)
Microsoft Excel (XLS)

New Features

With Version 16 the Leadtools.Document namespace has been deprecated and replaced by the new Leadtools.Forms.Ocr. It represents a new design, featuring:

High-level design
Increased ease of use
Choice of multiple OCR Engines

In addition, OCR output can be saved as PDF/A.

OCR Engines

The LEADTOOLS OCR class library offers support for multiple OCR engines. Currently, LEADTOOLS ships with the following:

LEADTOOLS OCR Advantage Includes automatic and manual zone detection, formatted output, auto-orientation support. PDF and PDF/A output, and OMR support is available.
LEADTOOLS OCR Plus provides functions, properties, methods, and events for easily incorporating automatic and manual zone detection, formatted output, auto-orientation, custom spelling dictionaries and MICR (magnetic ink character recognition) support into your applications. PDF and PDF/A output and ICR support is also available.
LEADTOOLS OCR Professional enables you to easily incorporate the fastest and most accurate OCR support possible into your applications. Also offers support for Asian languages.
LEADTOOLS OCR Arabic includes automatic and manual zone detection for Arabic documents/images, formatted output. PDF and PDF/A output among other output document formats, and OMR support is available.

With the LEADTOOLS OCR class library, the internal workings of the various engines are hidden and represented in a uniform class library. You should be able –if desired- to switch among any of the supported OCR engines without changing your application code or logic.

Standard OCR Engine Options

The following standard options are available in all OCR engines:

OMR (Optical Mark Recognition)
Auto/manual zoning
Formatted output
PDF output

Additional OCR engines are in the planning and development stages.

Key Features

Support for multi-threading and server-based OCR operations
Create multiple documents OCR documents in your application. Each document contains its own list of pages
Select the language to use in recognizing the OCR pages
Use dictionaries for improving OCR results (Advantage engine only)
Recognize a variety of documents, including facsimiles, photocopies and documents with complex layouts
Save the document in any of several output document formats including PDF, MS Word as well as regular text
Correct document characteristics such as noise, darkness, and lightness to achieve the best possible character recognition
Recognize a variety of documents, including facsimiles, photocopies and documents with complex layouts
Use artificial intelligence to improve recognition on documents of the same type.
Segment complex pages manually or automatically into text, image and table recognition zones. Powerful zone recognition tools include:
- Whole page recognition as one zone
- Manual specification and recognition for multiple zones within each page
- Automatic area segmentation for creating multi-layered zones and recognizing areas such as tables, rulers, images and text
- Specifying a different specialized recognition module for each zone, including OMR, MOR, MTX, FireWorX and Asian
- Displaying document pages with or without their zones
- Importing and exporting zones from and to files
Recognize text and colors within tables

Additional Features

Recognize text from 5 to 72 points in virtually any typeface
Recognize multiple languages within one document
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats
Process documents in two-page mode for open-faced books and magazines

Supported Environments

Reference

Introduction
Getting Started (Guide to Example Programs)
Programming with LEADTOOLS .NET OCR
An Overview of OCR Recognition Modules
Creating an OCR Engine Instance
Starting and Shutting Down the OCR Engine
OCR Spell Language Dictionaries
Working with OCR Languages
Working with OCR Pages
Working with OCR Zones
Using Text Recognizing OCR Pages
OCR Confidence Reporting
OCR Engine Specific Settings
OCR Tutorial - Working with Pages
OCR Tutorial - Recognizing Pages
OCR Tutorial - Adding and Painting Zones
OCR Tutorial - Working with Recognition Results
OCR Tutorial - Scanning to Searchable PDF
OCR Languages and Spell Checking
Using OMR in LEADTOOLS .NET OCR
Multi-Threading with LEADTOOLS OCR

Leadtools.Forms.Ocr requires an OCR module license and unlock key. For more information, refer to: Imaging Pro/Document/Medical Features