Using Recognized Characters

Following recognition, the set of recognized characters for a specific page can be obtained by calling the GetRecognizedCharacters method. This method updates the RecognizedCharactersCount property with the number of characters that have been recognized, and updates the RecognizedCharacter property with the set of recognized characters for the specified page. This property is a pointer to an array of ILTRecognizedCharacters objects, where each object represents one of the recognized characters. While the RecognizedCharacter property is read only, the ILTRecognizedCharacters properties accessed through it can be set. To set new recognized character information, set the ILTRecognizedCharacters properties, (for example RasterDoc.RecognizedCharacter(i).FontSize = 20) and then set the new information by calling the SetRecognizedCharacters method.

Various types of character information can be obtained or set using the ILTRecognizedCharacters properties. The Left and Top properties indicate the location of the recognized character, while the Height and Width properties give the character's dimensions. The YOffset property contains the distance along the Y axis from the baseline to the top of the character's bounding rectangle.

The font and the font size used for the recognized character are given in the Font property and the FontSize property, respectively. The Flags property contains further information on the formatting attributes of the recognized character.

The zone in which the recognized character is located is given in the ZoneIndex property. However, if the zone is a TABLE, the cell in which the recognized character is located is given in the CellIndex property.

The OCR engine may provide several "guesses" as to the identification of a recognized character. The GuessCode property contains the first "guess" for a character being recognized. If the engine rejects the character, however, this property will contain a 0. If the value of the GuessCode is not 0, the GuessCode2 and GuessCode3 properties contain the second and third guesses for the identification of the recognized character, respectively. The Confidence property contains a value that represents the certainty of the first guess.

A recognized character may be found in more than one language. The LanguageId property contains the first language in which the recognized character was found. The LanguageId2 property contains the second language in which the recognized character was found.

The Space property contains the number of spaces in front of the recognized character and the SpaceErr property contains a value indicating the certainty or reliability of the value in the Space property.