Character set filter.
Members
Value | Member | Description |
---|---|---|
0x00000000 | None | No character filters. |
0x00000001 | Digit | Recognition of numerals only. For example: "3" (Digit Three). |
0x00000002 | Uppercase | Recognition of uppercase letters only, including accented ones. For example: "A" (Capital A). |
0x00000004 | Lowercase | Recognition of lowercase letters only including accented ones. For example: "a" (Lowercase a). |
0x00000006 | Alpha | Upper and lowercase letters only. This is a combination of (Uppercase | Lowercase). |
0x00000008 | Punctuation | Recognition of punctuation signs only. For example: "!" (Exclamation Mark). |
0x00000010 | Miscellaneous | Recognition of other miscellaneous characters only. For example: "+" (Plus sign). |
0x0000001F | All | All characters. Since all elements are enabled, there is no filtering. This a combination of Digit (| Uppercase | Lowercase | Punctuation | Miscellaneous). |
0x00000020 | Plus | Enables the use of the "FilterPlus" characters. The FilterPlus characters are added after any filtering. For more information, refer to LEADTOOLS OCR Engine Settings and LEADTOOLS OCR Professional Engine Settings. |
0x00000021 | Numbers | Digits plus the "FilterPlus" characters This is a combination of (Digit | Plus). |
This enumeration lists the available character set filter elements. The Language environment can be narrowed by specifying Character Set filters. The name of each filter element indicates which category of characters it validates. This enumeration is attributes with the System.FlagsAttribute and its members can be combined (OR-ed) together.
The filters can have an effect either at zone level (by specifying the zone's OcrZone.CharacterFilters property), or globally, at image level (defined by the "Recognition.DefaultCharacterFilter" setting).
The way to set no filtering is to give the value OcrZoneCharacterFilters.All.
Characters of the document that are not part of the specified character set will either be rejected or will be recognized as a validated character with a similar shape. For instance, if only the English language has been selected and the document contains a letter "Capital A with acute", then the recognized output will be a letter "Capital A"
The recognition module selected for recognition can also impose restrictions, e.g. the OcrZoneRecognitionModule.IcrNumeral module is restricted to numerals and four other characters.
Not all recognition modules support all filter elements:
Recognition module | Character filters supported |
---|---|
OcrZoneRecognitionModule.OmniFontMText | OcrZoneCharacterFilters.All, OcrZoneCharacterFilters.Digit and OcrZoneCharacterFilters.Alpha |
OcrZoneRecognitionModule.OmniFontMor | All filters |
OcrZoneRecognitionModule.DotMatrix | All filters |
OcrZoneRecognitionModule.Omr | None (All ignored) |
OcrZoneRecognitionModule.IcrNumeral | OcrZoneCharacterFilters.All, OcrZoneCharacterFilters.Digit, OcrZoneCharacterFilters.Punctuation and OcrZoneCharacterFilters.Miscellaneous |
OcrZoneRecognitionModule.IcrCharacter | All filters |
OcrZoneRecognitionModule.MatrixMatching | All filters |
OcrZoneRecognitionModule.OmniFontPlus2WayVoting | All filters |
OcrZoneRecognitionModule.OmniFontFireWorx | All filters |
OcrZoneRecognitionModule.OmniFontPlus3WayVoting | All filters |
For an example, refer to DocumentConvertOptions.PagesZones