Each zone on a page has a recognition module associated with it through the RasterDocumentZoneData.RecognizeModule object. This recognition module provides information about the type of information contained in the zone and how to recognize that data. Depending on the type of recognition module, there may be additional options available for use during recognition. For example, if a zone is associated with a Multi-lingual Omni font Recognition module (MOR), then other recognition options for this module can be set and get using the MorEnableFaxMode property.
Similarly, if a zone is associated with a Hand Printed Numeral Recognition module, then other recognition options can be set using the HandPrintOptions property. If the zone is associated with an Optical Mark Recognition module (OMR), other recognition options can be set using the OmrOptions property
For some general information about available recognition modules, refer to An Overview of Recognition Modules
Depending on the type of recognition module associated with a zone, it may be beneficial to trade-off between the accuracy of recognition and the speed of recognition. Using the RecognizeModuleTradeoff property you can tell the OCR engine to perform the most accurate recognition, the fastest recognition, or a balanced recognition. To get the current trade-off setting for the OCR engine, check the RecognizeModuleTradeoff property
Use the following properties before starting the recognition process:
The EnableSubsystem property and the EnableCorrection property will be used to enable or disable the checking sub-system, which will be used in verification.
When all necessary recognition options have been set, the page(s) can be recognized by calling Recognize
Call the EnableEvents method to enable calling the RasterDocumentRecognizeStatusCallback callback. To stop firing the recognition status callback, call the DisableEvents method
After recognition is complete, the recognized characters can be obtained and the recognition results can be saved to a file or to memory.
The collection of characters recognized for a specific page can be obtained using GetRecognizedCharacters To add any characters to this collection of recognized characters, call SetRecognizedCharacters
The recognition results can be saved to a file by calling SaveResultsToFile The type of material exported to a file, the method in which the material is stored and the file type in which it is stored can all be controlled using SaveResultOptions property
When saving recognition results to a file, you can use the AvailableOutputFileFormats method to obtain all available output file formats supported by the OCR engine. To get specific information about a particular output file format, call GetTextFormatInfo
The recognition results can also be saved to memory by calling SaveResultsToMemory
To get or set special characters used in the recognition process, use the SpecialRejectedCharacter property
Finally, to get the status of the OCR engine at any time, use the RasterDocumentRecognizeStatusCallback