In This Topic ▼

An Overview of OCR Recognition Modules for the LEADTOOLS OCR Module - OmniPage Engine

LEADTOOLS provides fast and highly accurate Optical Character Recognition SDK technology for .NET (C# & VB), C/C++, iOS, macOS, Java, and web. The following sections describe this capability in more detail.

Automatic recognition module

NativeOcrZoneRecognitionModule.Auto

Specifies that the Engine will try to automatically select the most suitable recognition module for the zone. This is determined just before recognition, according to the zone's filling method and, if necessary, other settings, most typically the Character Set.

MTX OmniFont recognition module

NativeOcrZoneRecognitionModule.OmniFontMText

This recognition module recognizes machine-printed text; i.e. text from printed publications, laser or ink-jet printers and electric typewriters. Output from mechanical typewriters in good condition may also be acceptable. It should also be used for Letter or Near Letter Quality output from dot-matrix printers, and can also be used for Draft Quality.

Only images with the following resolution ranges are supported: 90-110, 160-240, 280-320, 400 and 600.

This module does not process images larger than 6600 pixels in either width or height. In other words, it can safely handle A3 size (11.69" x 16.54") (both portrait and landscape) images with 300 dpi resolution. At larger resolution the supported image size is smaller.

The module can handle a maximum of 64 zones defined on an image.

MOR multi-lingual OmniFont recognition module

NativeOcrZoneRecognitionModule.OmniFontMor

This module recognizes machine-printed text; i.e. text from printed publications, laser or ink-jet printers and electric typewriters. Output from mechanical typewriters in good condition may also be acceptable. It should also be used for Letter or Near Letter Quality (LQ, NLQ) output from dot-matrix printers. There are two ways of modifying incoming images to make them more suitable for the module:

Standard mode Fax output (200 x 100 dpi). This switch doubles the pixels in the image's vertical direction. Faxes sent in Fine Mode (200 x 200 dpi) are preferable. For best results, send faxes in Fine mode and do not enable this feature.
Draft 24-pin dot-matrix output. Use the NativeOcrZoneFillMethod.DraftDotMatrix24 filling method to have the character contours smoothed. Again, NLQ or LQ quality output can usually be better recognized without using NativeOcrZoneFillMethod.DraftDotMatrix24.

This module can safely handle A3 size (11.69" x 16.54") (both portrait and landscape) images with 300 dpi resolution.

The module can handle a maximum of 500 zones defined on an image.

DOT Matrix (DOT 9-pin draft dot-matrix recognition module)

NativeOcrZoneRecognitionModule.DotMatrix

This module is designed for ONLY draft-quality 9-pin dot-matrix texts. For NLQ or LQ texts, the NativeOcrZoneRecognitionModule.OmniFontPlus2WayVoting, NativeOcrZoneRecognitionModule.OmniFontPlus3WayVoting, NativeOcrZoneRecognitionModule.OmniFontMText or NativeOcrZoneRecognitionModule.OmniFontMor modules give better results.

OMR optical mark recognition module

NativeOcrZoneRecognitionModule.Omr

LEADTOOLS OMR (Optical Mark Recognition) Module extends the functionality of LEADTOOLS SDKs by providing properties, methods, and events for easily incorporating fast, automated, and accurate optical mark recognition into your application. Optical Mark Recognition is used in surveys, polls, academic exams and official applications, to recognize the bubbles that the applicant fills in to indicate their selections. Supported marks include tick marks, X's, lines, checkmarks, and scribbles. Supported shapes (or frames) include boxes, circles and ellipses. For more information, refer to Using OMR in LEADTOOLS .NET OCR.

ICR hand-printed numeral recognition module

NativeOcrZoneRecognitionModule.IcrNumeral

This recognition module can be used for hand-printed numerals and four additional signs. If more hand-printed characters are to be recognized, use the NativeOcrZoneRecognitionModule.IcrCharacter recognition module.

ICR hand-printed recognition module

NativeOcrZoneRecognitionModule.IcrCharacter

This recognition module can be used for hand-printed characters. If only numerals need to be recognized, use the NativeOcrZoneRecognitionModule.IcrNumeral recognition module.

MAT matrix matching recognition module

NativeOcrZoneRecognitionModule.MatrixMatching

This module is designed to read certain groups of fixed-font characters specially designed for OCR or imaging applications, in which no two characters have similar shapes. Each character group has its own filling method. Application areas are in banking, check or waybill handling, product distribution and document validation, where high accuracy can be vital. It also handles some non-fixed print styles.

FRX multi-lingual OmniFont recognition module

NativeOcrZoneRecognitionModule.OmniFontFireWorx

PLUS2WAY and PLUS3WAY Voting OmniFont recognition modules

NativeOcrZoneRecognitionModule.OmniFontPlus2WayVoting

NativeOcrZoneRecognitionModule.OmniFontPlus3WayVoting

These recognition modules recognize machine-printed text; i.e. text from printed publications, laser or ink-jet printers and electric typewriters. Output from mechanical typewriters in good condition may also be acceptable.

The PLUS2WAY and PLUS3WAY modules use voting technology to improve recognition results.

The PLUS2WAY voting module combines the results from the MOR and MTX modules.

The PLUS3WAY voting module combines the results from the MOR, MTX and FRX modules.

With either of these two voting modules, the accuracy is considerably better, but recognition may take significantly more time than any single module.

Note: The image handling algorithms for all modules of the OmniPage engine were designed to deal with practically unlimited image sizes: 32000 pixels in both directions.

Asian recognition module

NativeOcrZoneRecognitionModule.Asian

This module provides recognition services for four Asian languages with horizontal or vertical text direction; these languages are Japanese, Korean and Chinese - Traditional and Simplified. It can handle short embedded texts in English.

The Asian language handling differs somewhat from that for Western languages. Spell checking, editor display and verification are not available for Asian languages. Only one Asian language should be set for recognition and Western languages should not be set alongside an Asian language. However, the Asian OCR Engine can recognize short English texts embedded in Asian text, without English needing to be set. If embedded texts are in other Latin-alphabet languages, these similarly do not need to be set; however, accented characters may not always be handled correctly.

The following table shows the supported recognition modules for each OCR engine:

Recognition Module	OmniPage	LEAD	Arabic
Auto	Yes	Yes	Yes
OmniFontMText	Yes	Yes	No
OmniFontMor	Yes	Yes	No
DotMatrix	Yes	Yes	No
Omr	Yes	Yes	Yes
IcrNumeral	Yes	Yes	No
IcrCharacter	Yes	Yes	No
MatrixMatching	Yes	Yes	No
OmniFontFireWorx	Yes	Yes	No
OmniFontPlus2WayVoting	Yes	Yes	No
OmniFontPlus3WayVoting	Yes	Yes	No
Asian	Yes	Yes	No

The following table shows the text recognition module support for each of the 119 languages in the LEADTOOLS OCR Module - OmniPage Engine/LEAD engine:

Language	Omr	OmniFontMText	OmniFontFireWorx	OmniFontPlus2WayVoting	OmniFontPlus3WayVoting	DotMatrix	Icr
Afrikaans	Yes	No	Yes	Yes	Yes	Yes C	Yes
Albanian	Yes	No	Yes	Yes	Yes	Yes C	Yes
Aymara	Yes	No	Yes	Yes	Yes	Yes	Yes
Basque	Yes	No	Yes	Yes	Yes	No	Yes
Bemba	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Blackfoot	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Brazilian B	Yes B	Yes	Yes	Yes	Yes	Yes	Yes
Breton	Yes	No	Yes	Yes	Yes	Yes C	Yes
Bugotu	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Bulgarian	Yes	No	Yes	Yes	Yes	No	No
Byelorussia	Yes	No	Yes	Yes	Yes	No	No
Catalan	Yes	No	Yes	Yes	Yes	Yes C	Yes
Chamorro	Yes	No	No	Yes	Yes	Yes	Yes
Chechen	Yes	No	No	Yes	Yes	No	No
Corsican	Yes	No	No	Yes	Yes	Yes	Yes
Croatian	Yes	No	Yes	Yes	Yes	No	Yes
Crow	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Czech	Yes	No	Yes	Yes	Yes	No	Yes
Danish	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Dutch	Yes	Yes	Yes	Yes	Yes	Yes C	Yes
English	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Eskimo (Inuit)	Yes	No	Yes	Yes	Yes	No	Yes
Esperanto	Yes	No	No	Yes	Yes	No	No
Estonian	Yes	No	Yes	Yes	Yes	Yes	Yes
Faroese	Yes	No	Yes	Yes	Yes	No	No
Fijian	Yes	No	No	Yes	Yes	No	Yes
Finnish	Yes	Yes	Yes	Yes	Yes	Yes	Yes
French	Yes	Yes	Yes	Yes	Yes	Yes C	Yes
Frisian	Yes	No	Yes	Yes	Yes	Yes C	Yes
Friulian	Yes	No	Yes	Yes	Yes	Yes C	Yes
Gaelic (Irish)	Yes	No	Yes	Yes	Yes	Yes	Yes
Gaelic (Scottish)	Yes	No	Yes	Yes	Yes	Yes C	Yes
Galician	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Ganda	Yes	No	No	Yes	Yes	No	Yes
German	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Greek	Yes	No	Yes	Yes	Yes	Yes	No
Guarani	Yes	No	No	Yes	Yes	Yes C	Yes
Hani	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Hawaiian	Yes	Yes EN	Yes	Yes	Yes	Yes	Yes
Hungarian	Yes	No	Yes	Yes	Yes	Yes	Yes
Icelandic	Yes	No	Yes	Yes	Yes	No	No
Ido	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Indonesian	Yes	Yes EN	Yes	Yes	Yes	Yes	Yes
Interlingua	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Italian	Yes	Yes	Yes	Yes	Yes	Yes C	Yes
Kabardian	Yes	No	No	Yes	Yes	No	No
Kasub	Yes	No	No	Yes	Yes	No	Yes
Kawa	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Kikuyu	Yes	No	No	Yes	Yes	No	No
Kongo	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Kpelle	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Kurdish	Yes	No	Yes	Yes	Yes	No	Yes
Latin L	Yes	Yes L	Yes	Yes	Yes	Yes L	Yes
Latvian	Yes	No	Yes	Yes	Yes	No	Yes
Lithuanian	Yes	No	Yes	Yes	Yes	No	Yes
Luba	Yes	No	No	Yes	Yes	No	Yes
Luxembourgian	Yes	No	No	Yes	Yes	Yes C	Yes
Macedonian	Yes	No	Yes	Yes	Yes	No	No
Malagasy	Yes	Yes EN/M	No	Yes	Yes	Yes C	Yes
Malay	Yes	No	Yes	Yes	Yes	No	Yes
Malinke	Yes	No	No	Yes	Yes	Yes C	Yes
Maltese	Yes	No	No	Yes	Yes	No	No
Maori	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Mayan	Yes	No	No	Yes	Yes	Yes	Yes
Miao	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Minankabaw	Yes	No	No	Yes	Yes	No	Yes
Mohawk	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Moldavian	Yes	No	No	Yes	Yes	No	No
Nahuatl	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Norwegian	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Nyanja	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Occidental	Yes	No	No	Yes	Yes	Yes	Yes
Ojibway	Yes	No	No	Yes	Yes	No	Yes
Papiamento	Yes	No	No	Yes	Yes	Yes	Yes
Pigin English	Yes	Yes EN	Yes	Yes	Yes	Yes	Yes
Polish	Yes	No	Yes	Yes	Yes	No	Yes
Portuguese	Yes	Yes	Yes	Yes	Yes	Yes C	Yes
Provençal	Yes	No	No	Yes	Yes	Yes C	Yes
Quechua	Yes	No	No	Yes	Yes	Yes	Yes
Rhaetic	Yes	No	No	Yes	Yes	Yes C	Yes
Romanian	Yes	No	Yes	Yes	Yes	No	No
Romany	Yes	No	No	Yes	Yes	No	Yes
Ruanda	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Rundi	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Russian	Yes	No	Yes	Yes	Yes	No	No
Sami	Yes	No	No	Yes	Yes	No	Yes
Sami, Lule	Yes	No	No	Yes	Yes	No	Yes
Sami, Northern	Yes	No	No	Yes	Yes	No	Yes
Sami, Southern	Yes	No	No	Yes	Yes	No	Yes
Samoan	Yes	No	No	Yes	Yes	Yes C	Yes
Sardinian	Yes	No	No	Yes	Yes	Yes C	Yes
Serbian	Yes	No	Yes	Yes	Yes	No	No
Serbian, Latinic	Yes	No	Yes	Yes	Yes	No	Yes
Shona S	Yes	Yes S	No	Yes	Yes	Yes S	Yes
Sioux	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Slovak	Yes	No	Yes	Yes	Yes	No	Yes
Slovenian	Yes	No	Yes	Yes	Yes	No	Yes
Somali	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Sorbian (Wend)	Yes	No	Yes	Yes	Yes	No	Yes
Sotho	Yes	No	No	Yes	Yes	Yes	Yes
Spanish	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Sundanese SN	Yes	No	No	Yes	Yes	Yes SN	Yes SN
Swahili	Yes	Yes EN	Yes	Yes	Yes	Yes	Yes
Swazi	Yes	No	No	Yes	Yes	No	Yes
Swedish	Yes	Ye	Yes	Yes	Yes	Yes	Yes
Tagalog	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Tahitian	Yes	No	Yes	Yes	Yes	Yes C	Yes
Tinpo	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Tongan	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Tswana (Chuana)	Yes	No	No	Yes	Yes	Yes C	Yes
Tun *	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Turkish	Yes	No	Yes	Yes	Yes	No	Yes T
Ukrainian	Yes	No	Yes	Yes	Yes	No	No
Visayan	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Welsh	Yes	No	Yes	Yes	Yes	Yes W	Yes W
Wolof	Yes	No	No	Yes	Yes	Yes C	Yes
Xhosa	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Zapotec	Yes	Yes EN	No	Yes	Yes	Yes	Yes
Zulu	Yes	No	Yes	Yes	Yes	No	Yes

* = This language can be handled only if it is written in the Latin alphabet.

B = Brazilian has a separate dictionary from Portuguese in the OmniFontMText and OmniFontFireWorx modules. Other modules treat Brazilian as Portuguese. Brazilian is available for language marking in the output document.

L = Latin is usually written without accented letters, but sometimes breves or macrons are placed over vowels. In these cases, the indicated modules do not provide support.

M = Some dialects of Malagasy are written without accents. In these cases, OmniFontMText provides support.

S = Shona may be written without accents, but sometimes uses acutes and graves on vowels. In these cases the indicated modules do not provide full support.

SN = Sundanese uses only one accented letter; sometimes this is E-breve, sometimes E-acute. The indicated modules support E-acute but not E-breve.

W = Welsh contains two rarely used characters: W-circumflex and Y-circumflex. These modules can handle Welsh with the exception of these two characters.

Footnotes on OmniFontMText:

The twelve selectable languages are those with Yes with no added footnote letter. For these languages this module uses its own language dictionaries.

EN = Languages denoted are thought to contain no accented letters. To read them, select English and disable spell checking from a main dictionary.

Footnotes on DotMatrix modules:

C = Not all uppercase letters are supported. See the module specification for a precise listing. This is probably not a serious restriction, since many 9-pin dot-matrix printers cannot print all the accented uppercase characters.

Footnotes on Icr modules:

T = The module cannot handle the lowercase dotless-i.