Recognizing Document Pages
Before Recognition
Before recognizing the pages of an OCR document, several steps must be taken. These are outlined below:
Start and initialize
the OCR engine: For more detailed information on this refer to Starting
and Shutting Down the Engine.
Populate the internal
OCR document with pages: For more information on adding pages, refer to
Working with Pages.
Apply zones to pages
if needed: For more detailed information on adding zones to a page and
the options available based on the recognition module associated with
a zone, refer to Working with Zones,
Using the ILTZoneData Properties,
Using the HandPrint Options
and Using the OMR Options.
Depending on the type of recognition module associated with a zone, it may be beneficial to trade-off between the accuracy of recognition and the speed of recognition. The RecognizeModuleTradeoff property gets or sets a value that tells the OCR engine to perform the most accurate recognition, the fastest recognition, or a balanced recognition.
If the host PC has two processors or a hyper-threaded one, using the Parallel Recognition Mode can speed up the recognition process by allowing the two recognition engines to run in parallel. The Parallel Recognition Mode may be used when a zone is associated with any of the following recognition modules: RECOGMODULE_MTEXT_OMNIFONT, RECOGMODULE_OMNIFONT_FRX, or RECOGMODULE_OMNIFONT_PLUS3W. To enable or disable the Parallel Recognition Mode, set the EnableParallelRecognition property accordingly.
Set the desired engine settings: Before recognizing
pages, the user can set some general engine settings to suit the type
of document to be recognized. This includes language information, checking
subsystem information, etc. The checking subsystem is responsible for
checking spelling, checking the user dictionary for acceptable words during
recognition, provided a user dictionary has been set, and using the Verification event. Many of
the settings that control this subsystem must be set before recognition.
To enable the use of the subsystem, set the EnableSubSystem
property to TRUE. To enable the correction mode of the subsystem,
set the EnableCorrection
to TRUE.
The SpellLanguageID property indicates the language the subsystem uses to check spelling. For further information on languages in general, refer to Working with Languages. For more information on preparing a user dictionary, refer to Working with a Dictionary.
To have LEADTOOLS OCR generate system error events, in addition to error return codes, set the EnableMethodErrors property to TRUE.
Recognizing a Document
Call the Recognize method: When all necessary
recognition options have been set, the page(s) can be recognized by calling
the Recognize method.
Generate the RecognitionStatus
event: The RecognitionStatus event is generated during the recognition
process to provide the user with information about any errors that occur.
To enable the generation of the RecognitionStatus event, set the EnableFireRecognizeStatus
property to TRUE. To stop the current RecognitionStatus event, set
the EnableStopRecognizeStatus
property to TRUE within the RecognitionStatus event. The generation
of the RecognitionStatus event can also be disabled outside the RecognitionStatus
event by setting the EnableFireRecognizeStatus
property to FALSE.
Generate ProgressStatus events: The ProgressStatus
event is generated during the recognition process to provide the user
with information about the progress of a specific operation, the number
of accepted characters seen so far, the number of rejected characters
seen so far, and other aspects of the recognition process. To enable the
generation of this event, set the EnableProgressStatusEvent
property to TRUE. To stop the current ProgressStatus event, set the
StopProgressStatusEvent
property value to TRUE. The generation of the ProgressStatus event
can also be disabled outside the ProgressStatus event by setting the EnableProgressStatusEvent
property to FALSE
After Recognition
Get the status of the last recognition process:
Accuracy and timing data for the last recognition process can be obtained
using the GetStatus method.
For more information, refer to Getting
Status Updates.
Save the recognition results: The recognition
results can be saved to a file or saved to memory. For more detailed information
on saving recognition results and the material to export, refer to Handling the Results
of the Recognition Process.
Get recognized character sets: Once a page
has been recognized, the set of recognized characters for that page can
be obtained by calling the GetRecognizedCharacters
method. This updates the RecognizedCharacter
property with the set of recognized characters for that page. This
property is a pointer to an array of ILTRecognizedCharacters
objects, where each object represents one of the recognized characters.
While the RecognizedCharacter property is read only, the ILTRecognizedCharacters
properties accessed through it can be set. . For more information on the
ILTRecognizedCharacters properties accessed through the RecognizedCharacters
property, refer to Using Recognized
Characters.
Get special characters or symbols: During
recognition, special characters or symbols are used to denote missing
characters or rejected characters. These characters or symbols, used in
the recognition process can be found in the SpecialMissingSymbol
property and the SpecialRejectedCharacter
property.