For some applications, it may be important to know the reliability of the recognized text generated by the engine. These applications may require having additional confidence information for the recognized characters and/or words.
The engine can provide confidence information for the correctness of the recognized text directly into application memory by a call to L_OcrPage_GetRecognizedCharacters, just after calling L_OcrPage_Recognize. The L_OcrPage_GetRecognizedCharacters call provides the most detailed information about the recognized data. It results in a L_OcrCharacter structure for each recognized character.
There are three properties in the L_OcrCharacter structure, which provide character recognition confidence information: the Confidence, WordIsCertain and the LeadingSpacesConfidence properties.
The WordIsCertain property express the certainty/uncertainty of the word this character is part of.
The Confidence property express the certainty of the recognition of the character, which ranges between 0 and 100. A value of 100 means that the Engine recognized the character with high confidence.
The LeadingSpacesConfidence property ranges between 0 and 100, and it expresses the confidence of the value in the LeadingSpaces property of the structure, i.e. whether the Engine is certain regarding the space estimation in front of the recognized character.
Applications that examine the character confidence information can use a threshold value, below which the character value is treated as a suspicious result. A value of 64 is recommended for this purpose. A value equal to or larger than 64 will indicate that the character was recognized with high confidence. A value less than 64 indicates that the code is suspicious.
Figure 1. Confidence Threshold with a Specified Value of 64
Confidence level is also reported for OMR zones. For more information, refer to Using OMR in LEADTOOLS C API OCR.