Programming with LEADTOOLS OCR Functions

The LEADTOOLS OCR features provide functions for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text. The LEADTOOLS OCR features are based on and use the Nuance TextBridge® OCR Engine.

Once the LEADTOOLS OCR API toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.

LEADTOOLS provides functions to:

image\sqrblit.gif recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.

image\sqrblit.gif select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.

image\sqrblit.gif segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.

image\sqrblit.gif set accuracy thresholds prior to recognition to control the accuracy of recognition.

image\sqrblit.gif learn, save, and load character recognition data for similar documents. The software learns as a result of normal recognition, and acquires additional information by using the OCR’s text verification system.

image\sqrblit.gif recognize text from 5 to 72 points in virtually any typeface.

image\sqrblit.gif increase recognition accuracy with built-in and user dictionaries.

image\sqrblit.gif automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.

image\sqrblit.gif process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.

image\sqrblit.gif save the document in any of 40 formats, including MS Word, MS Excel, Dbase, PDF and WordPerfect.

LEADTOOLS uses an OCR handle to interact with the OCR engine and the internal OCR list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.

The following is an outline of the general steps involved in recognizing one or more pages.

1.

StartUp the OCR Engine with the L_DocStartUp function. For more information, refer to Starting and Shutting Down the Engine.

2.

Establish an internal OCR document with one or more pages. For more information, refer to Working with Pages.

3.

Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with Zones.

4.

Set the active languages to be used by the OCR engine. (Optional. The default is English.) For more information, refer to Working with Languages.

5.

Set the spell checking language. (Optional. The default is English.) For more information, refer to Working with Languages.

6.

Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing Document Pages and An Overview of Recognition Modules.

7.

Provide code for the VERIFICATIONCALLBACK function, if it will be used. (Optional)

8.

Recognize. For more information, refer to Recognizing Document Pages.

9.

Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing Document Pages.

10.

Shut down the OCR engine when finished. For more information, refer to Starting and Shutting Down the Engine.

For more information, refer to:

An Overview of Recognition Modules

Demo Programs

Tutorials

LEADTOOLS OCR Support Forum

See Also:

Introduction

Sample Programs

LEADTOOLS Documentation

Microsoft Code Snippet Picker

LEADTOOLS Support Forums