Recognizing Document Pages

Each zone on a page has a recognition module associated with it through the ZONEDATA2.RecogModule member. The ZONEDATA2.RecogModule recognition module provides information about the type of information contained in the zone and how to recognize that data. Depending on the type of recognition module being used, there may be additional options available for use during recognition.

After recognition is complete, the recognized characters can be obtained, and the recognition results can be saved to a file or to memory.

LEADTOOLS OCR Module - OmniPage Engine

For example, if a zone is associated with a Multi-lingual Omnifont Recognition module (MOR), then other recognition options for this module can be set using L_Doc2SetMOROptions / L_Doc2SetMOROptionsExt. To get the current MOR options, use L_Doc2GetMOROptions / L_Doc2GetMOROptionsExt.

Similarly, if a zone is associated with a Hand-Printed Numeral Recognition module (ICR-HNR), then other recognition options can be set using L_Doc2SetHandPrintOptions / L_Doc2SetHandPrintOptionsExt, or retrieved using L_Doc2GetHandPrintOptions / L_Doc2GetHandPrintOptionsExt. If the zone is associated with an Optical Mark Recognition module (OMR), other recognition options can be set using L_Doc2SetOMROptions / L_Doc2SetOMROptionsExt, and retrieved using L_Doc2GetOMROptions / L_Doc2GetOMROptionsExt.

For general information about the available recognition modules, refer to An Overview of Recognition Modules.

Depending on the type of recognition module associated with a zone, it may be beneficial to make a trade-off between the accuracy of recognition and the speed of recognition. Call L_Doc2SetRecognizeModuleTradeOff / L_Doc2SetRecognizeModuleTradeOffExt to tell the OCR engine to perform the most accurate recognition, the fastest recognition, or provide a balanced recognition. To get the current trade-off setting for the OCR engine, call L_Doc2GetRecognizeModuleTradeOff / L_Doc2GetRecognizeModuleTradeOffExt.

To get the status of the OCR engine at any time, call L_Doc2GetStatus / L_Doc2GetStatusExt.

Pre-Processing

Pre-processing documents (for example, rotating, inverting, brightening, controlling the threshold, or changing the binarization mode), can increase accuracy and improve performance. Get the current engine's pre-processing options by calling L_Doc2GetPreProcessingOptions / L_Doc2GetPreProcessingOptionsExt. Update these options by calling L_Doc2SetPreProcessingOptions.

Be sure to call L_Doc2SetPreProcessingOptions / L_Doc2SetPreProcessingOptionsExt before calling L_Doc2FindZones / L_Doc2FindZonesExt or L_Doc2Recognize / L_Doc2RecognizeExt. The pre-processing options set affect both auto-zoning and recognition results.

The type of material exported to a file, the method by which the material was stored, and the file type in which it is stored can all be controlled using L_Doc2SetRecognitionResultOptions / L_Doc2SetRecognitionResultOptionsExt. To get the current recognition results settings, call L_Doc2GetRecognitionResultOptions / L_Doc2GetRecognitionResultOptionsExt.

Setting Recognition Options

Call L_Doc2GetSupportedEngineFormats to obtain a list of all supported native engine formats. Retrieve the friendly name for each of the retrieved formats by calling L_Doc2GetEngineFormatFriendlyName. Call L_Doc2FreeEngineFormats when the format list returned by the L_Doc2GetSupportedEngineFormats function is no longer needed.

When calling L_Doc2SetRecognitionResultOptions / L_Doc2SetRecognitionResultOptionsExt, be sure to specify the document writer format. Set the options for document writer format by calling L_Doc2SetDocumentWriterOptions / L_Doc2SetDocumentWriterOptionsExt, and get then by calling L_Doc2GetDocumentWriterOptions /L_Doc2GetDocumentWriterOptionsExt.

The OCR engine supports different settings for each output format. These settings affect the output file. To get specific format settings, call L_Doc2GetOutputFormatSettings / L_Doc2GetOutputFormatSettingsExt. Update the settings, then call L_Doc2SetOutputFormatSettings.

To get or set special characters for missing or unknown characters, call L_Doc2GetSpecialChar / L_Doc2GetSpecialCharExt and L_Doc2SetSpecialChar / L_Doc2SetSpecialCharExt.

Recognizing

After all necessary recognition options have been set, recognize the page(s) by calling L_Doc2Recognize / L_Doc2RecognizeExt. To get information about the status of the recognition process during recognition, pass a valid pointer to a RECOGNIZESTATUSCALLBACK2 function to the L_Doc2Recognize function.

The collection of characters recognized for a specific page can be obtained using L_Doc2GetRecognizedCharacters / L_Doc2GetRecognizedCharactersExt. To add any characters to this collection, call L_Doc2SetRecognizedCharacters / L_Doc2SetRecognizedCharactersExt. When the collection is no longer needed, free it by calling L_Doc2FreeRecognizedCharacters.

L_Doc2GetRecognizedCharacters / L_Doc2GetRecognizedCharactersExt fill color indices in RECOGCHARS2.nFGColorIndex and RECOGCHARS2.nBGColorIndex. To get the colors associated with these indices, first get the colors table, and then get the color. To get the colors table, call L_Doc2GetRecognizedCharactersColors / L_Doc2GetRecognizedCharactersColorsExt.

To get character choices after a previous call to L_Doc2GetRecognizedCharacters / L_Doc2GetRecognizedCharactersExt, call L_Doc2GetCharacterChoices / L_Doc2GetCharacterChoicesExt. To free the memory allocated by this function, call L_Doc2FreeCharacterChoices.

After the characters for a specific page have been determined using L_Doc2GetRecognizedCharacters / L_Doc2GetRecognizedCharactersExt, call L_Doc2GetRecognizedWords / L_Doc2GetRecognizedWordsExt to combine the recognized characters into words. To change the contents of the recognized words, change the set of recognized characters by calling L_Doc2SetRecognizedCharacters / L_Doc2SetRecognizedCharactersExt. Save the updated recognized characters to a file by calling L_Doc2SaveResultsToFile / L_Doc2SaveResultsToFileExt or L_Doc2SaveResultsToFile2. When the collection of recognized words is no longer needed, free it by calling L_Doc2FreeRecognizedWords.

To get word suggestions after calling L_Doc2GetRecognizedWords / L_Doc2GetRecognizedWordsExt, call L_Doc2GetWordSuggestions / L_Doc2GetWordSuggestionsExt. Free the memory allocated by this function by calling L_Doc2FreeWordSuggestions.

Saving

When saving recognition results to a file, use L_Doc2EnumOutputFileFormats to enumerate all available output file formats supported by the OCR engine. L_Doc2EnumOutputFileFormats reports each file format to an ENUMOUTPUTFILEFORMATS2 callback function. To get specific information about a particular output file format, call L_Doc2GetTextFormatInfo.

After recognition is complete, the recognized characters can be obtained and the recognition results can be saved to a file or to memory.

Save recognition results to a file by calling L_Doc2SaveResultsToFile / L_Doc2SaveResultsToFileExt, or L_Doc2SaveResultsToFile2.

Use L_Doc2SaveResultsToFile2 to save the recognition results to different formats using the same recognition results and at the same time maintain quality. However, it consumes more memory than L_Doc2SaveResultsToFile / L_Doc2SaveResultsToFileExt.

If memory is a constraint, use L_Doc2SaveResultsToFile / L_Doc2SaveResultsToFileExt instead. However, note that L_Doc2SaveResultsToFile / L_Doc2SaveResultsToFileExt requires OCR to be performed separately for each file format in order to maintain quality.

The difference between L_Doc2SaveResultsToFile and L_Doc2SaveResultsToFileExt is that L_Doc2SaveResultsToFileExt includes the document ID parameter (nDocId).

For more information refer to:

An Overview of Recognition Modules

Recognizing Multiple Documents

Drawing Pages and Zones

Working with Pages

Working with Zones

Help Version 21.0.2021.7.2
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2021 LEAD Technologies, Inc. All Rights Reserved.

LEADTOOLS OCR Module - OmniPage Engine C API Help
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2021 LEAD Technologies, Inc. All Rights Reserved.