Recognizing Document Pages
Before Recognition
Before recognizing the pages of an OCR document, several steps must be taken. These are outlined below:
Start and initialize the OCR engine: For more detailed information on this refer to Starting and Shutting Down the Engine.
Populate the internal OCR document with pages: For more information on adding pages, refer to Working with Pages.
Apply zones to pages if needed: For more detailed information on adding zones to a page and the options available based on the recognition module associated with a zone, refer to Working with Zones, Using the ILTZoneData Properties, Using the HandPrint Options and Using the OMR Options.
Depending on the type of recognition module associated with a zone, it may be beneficial to trade-off between the accuracy of recognition and the speed of recognition. The RecognizeModuleTradeoff property gets or sets a value that tells the OCR engine to perform the most accurate recognition, the fastest recognition, or a balanced recognition.
If the host PC has two processors or a hyper-threaded one, using the Parallel Recognition Mode can speed up the recognition process by allowing the two recognition engines to run in parallel. The Parallel Recognition Mode may be used when a zone is associated with any of the following recognition modules: RECOGMODULE_MTEXT_OMNIFONT, RECOGMODULE_OMNIFONT_FRX, or RECOGMODULE_OMNIFONT_PLUS3W. To enable or disable the Parallel Recognition Mode, set the EnableParallelRecognition property accordingly.
Set the desired engine settings: Before recognizing pages, the user can set some general engine settings to suit the type of document to be recognized. This includes language information, checking subsystem information, etc. The checking subsystem is responsible for checking spelling, checking the user dictionary for acceptable words during recognition, provided a user dictionary has been set, and using the Verification event. Many of the settings that control this subsystem must be set before recognition. To enable the use of the subsystem, set the EnableSubSystem property to TRUE. To enable the correction mode of the subsystem, set the EnableCorrection to TRUE.
The SpellLanguageID property indicates the language the subsystem uses to check spelling. For further information on languages in general, refer to Working with Languages. For more information on preparing a user dictionary, refer to Working with a Dictionary.
To have LEADTOOLS OCR generate system error events, in addition to error return codes, set the EnableMethodErrors property to TRUE.
Recognizing a Document
Call the Recognize method: When all necessary recognition options have been set, the page(s) can be recognized by calling the Recognize method.
Generate the RecognitionStatus event: The RecognitionStatus event is generated during the recognition process to provide the user with information about any errors that occur. To enable the generation of the RecognitionStatus event, set the EnableFireRecognizeStatus property to TRUE. To stop the current RecognitionStatus event, set the EnableStopRecognizeStatus property to TRUE within the RecognitionStatus event. The generation of the RecognitionStatus event can also be disabled outside the RecognitionStatus event by setting the EnableFireRecognizeStatus property to FALSE.
Generate ProgressStatus events: The ProgressStatus event is generated during the recognition process to provide the user with information about the progress of a specific operation, the number of accepted characters seen so far, the number of rejected characters seen so far, and other aspects of the recognition process. To enable the generation of this event, set the EnableProgressStatusEvent property to TRUE. To stop the current ProgressStatus event, set the StopProgressStatusEvent property value to TRUE. The generation of the ProgressStatus event can also be disabled outside the ProgressStatus event by setting the EnableProgressStatusEvent property to FALSE
After Recognition
Get the status of the last recognition process: Accuracy and timing data for the last recognition process can be obtained using the GetStatus method. For more information, refer to Getting Status Updates.
Save the recognition results: The recognition results can be saved to a file or saved to memory. For more detailed information on saving recognition results and the material to export, refer to Handling the Results of the Recognition Process.
Get recognized character sets: Once a page has been recognized, the set of recognized characters for that page can be obtained by calling the GetRecognizedCharacters method. This updates the RecognizedCharacter property with the set of recognized characters for that page. This property is a pointer to an array of ILTRecognizedCharacters objects, where each object represents one of the recognized characters. While the RecognizedCharacter property is read only, the ILTRecognizedCharacters properties accessed through it can be set. . For more information on the ILTRecognizedCharacters properties accessed through the RecognizedCharacters property, refer to Using Recognized Characters.
Get special characters or symbols: During recognition, special characters or symbols are used to denote missing characters or rejected characters. These characters or symbols, used in the recognition process can be found in the SpecialMissingSymbol property and the SpecialRejectedCharacter property.