Confidence Reporting: OCR Professional
For some applications, it may be important to know the reliability of the recognized text generated by the engine. These applications may require having additional confidence information for the recognized characters and/or words.
The RECOGCHARS2.nConfidence field is a combined value. Its most significant bit is used to express the certainty/uncertainty of the word. (If this bit is set to 1, the word is uncertain.) The remaining bits express the certainty of the character recognition: ranging from 0 to 100.
A zero value (0) means that the engine recognized the character with high confidence. In some cases a word may have some or all characters that are individually suspicious but the characters are not marked as such in the word bit. This is usually a result of language or user dictionary checking, meaning that the word was validated by the checking module.
If only the User-written checking or the User Dictionary are enabled on a zone and the section name is specified, the characters of the non-dictionary words get a value of 100 in their RECOGCHARS2.nConfidence field.
If a zone enables only User Dictionary, and the section name is specified, the non-dictionary words are replaced with similar dictionary ones.
Applications that examine the character confidence information can use a threshold value, above which the character value is treated as a suspicious result. A value of 64 is recommended for this purpose. A value less than 64 will indicate that the character was recognized with high confidence. A value of 64 or greater marks that the code is suspicious.
NOTE: |
The confidence reporting system works best when all three recognition modules are used in the voting scheme (DOC2_RECOGMODULE_OMNIFONT_PLUS3W), but this is not the default value. If other machine print recognition modules are used (DOC2_RECOGMODULE_OMNIFONT_PLUS2W, DOC2_RECOGMODULE_MTEXT_OMNIFONT, etc) then confidence information is still available, but the ability of the system to properly report confidence will be reduced. This will result in a higher level of false negative and false positive reporting of suspicious recognition results. |