The following table describes the settings supported by the LEADTOOLS OCR Advantage Engine:
Name | Type | Range and values | Description | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Recognition | BeginCategory | N/A | Beginning of the recognition settings category | ||||||||||||||||||||||||
Recognition.RecognitionModuleTradeoff | Enum | Accurate, Balanced, Fast | Recognition module tradeoff between speed and accuracy. Default value is Balanced | ||||||||||||||||||||||||
Recognition.ModifyProcessingImage | Boolean | N/A | True to modify the processing image after recognition; otherwise, False. It is best to set the value of this setting to True if L_OcrPage_Recognize is called only once per page. L_OcrAutoRecognizeManager will temporarily set the value of this setting to True while performing a recognition job. |
||||||||||||||||||||||||
Recognition.DetectColors | Boolean | N/A | Automatically detect the foreground and background colors of each character. Default value is False. If this value is True, then the engine will try to automatically detect the colors of the zones when L_OcrPage_AutoZone is called and set the values in ForeColor and BackColor members of L_OcrZone structure. |
||||||||||||||||||||||||
Recognition.AutoSecondPass | Boolean | N/A | Automatically perform second image processing clean up on the internal B/W image if the first pass did not provide satisfactory results. Default value is True. | ||||||||||||||||||||||||
Recognition.MaximumPageConventionalMemorySize | Integer | 0 to 2147483647 | L_OcrAutoRecognizeManager has support for loading bitmap handle objects directly from disk files. The loaded bitmap handle holds the original image that will only be useful when saving graphics zones or image over text overlays. If this image was of a large size and was created using conventional memory, then the process will use a large amount of its physical memory holding this image and not using it for other purposes such as auto-zoning or recognizing. This is more noticeable in multi-threaded applications were loading several large images in the conventional memory will cause out of memory errors on operations that should normally succeed. L_OcrEngine can automatically switch to use the disk memory feature of BITMAPHANDLE if the size of the image in memory is to exceed the predetermined value set in "MaximumPageConventionalMemorySize". "MaximumPageConventionalMemorySize" is in KBytes and the default value is 42984 (42MBytes) for x86 and 429840 (420MBytes) for x64. This value allow a typical OCR image of 8.5 by 11 inches at 300 DPI and 32-bits per pixel to be in conventional memory, but anything significantly larger than that to use disk memory mode. Naturally using disk-memory is slower than using conventional memory. The exact ratio depends on the speed of the machine hard drive. Also, using disk-memory might end up speeding up the overall process since freeing the physical memory increases the performance of other operations such as auto-zone and recognize and the load operation that will certainly be slower might not take a large chunk of the overall time. The exact value to set depends on the system hardware configuration, number of cores and application types. You should experiment with changing this value if you get out of memory errors in your application. |
||||||||||||||||||||||||
Recognition.Threading | BeginCategory | N/A | Beginning of the recognition thread settings category | ||||||||||||||||||||||||
Recognition.Threading.MaximumThreads | Integer | 0 to 2147483647 | Gets or sets the maximum number of threads to use in recognition. The LEADTOOLS Advantage OCR engine provides support for recognizing document zones in separate threads. This can improve the performance of the L_OcrPage_Recognize method. The default value of 0 (zero) instructs LEADTOOLS to use the system thread pool. If you do not wish to use multi-threading inside the L_OcrPage_Recognize method then set the value of the Recognition.Threading.MaximumThreads to 1. Any other value is treated as 0 (use the thread pool). |
||||||||||||||||||||||||
End:Recognition.Threading | EndCategory | N/A | End of the recognition thread settings category. | ||||||||||||||||||||||||
Recognition.PreProcessing | BeginCategory | N/A | Beginning of the pre-processing settings category | ||||||||||||||||||||||||
Recognition.Preprocess.BlackWhiteImageConversionMethod | Enum | Default, Dynamic, User | This setting will influence how a non-B/W image, stored in the Engine, will be converted to a B/W one. Default: This affects grayscale or 24-bit color images, a B/W image will be created in the Engine's memory. Image binarization applies an automatic adaptive thresholding algorithm. Dynamic: This affects grayscale or 24-bit color images, a B/W image will be created in the Engine's memory. Each pixel is compared to a dynamically-calculated threshold, if the pixel intensity is higher it is set to white otherwise it is set to black. User: This affects grayscale or 24-bit color images, a B/W image will be created in the Engine's memory. Thresholding with a user-defined threshold value, set by the Recognition.Preprocess.BlackWhiteImageConversionThreshold setting. |
||||||||||||||||||||||||
Recognition.Preprocess.BlackWhiteImageConversionThreshold |
Integer |
0 to 255 |
The threshold to use when converting colored images to bitonal (black/white) in preparation to recognizing the text on the image. The conversion is done to separate the text intensities from the background intensities. This is the equivalent of calling L_IntensityDetectBitmap on the image with crInColor equals to the detected foreground (text) color, crOutColor equals to the detected background color, uChannel to IDB_CHANNEL_MASTER, uHigh equals to 255, and uLow equals to the value of this setting. Default value is 185. |
||||||||||||||||||||||||
Recognition.Preprocess.MobileImagePreprocess |
Boolean |
N/A |
True to enable mobile image processing mode; otherwise, false. By default, the OCR engine will try to upscale images with a low resolution (DPI). However, in most mobile devices, the camera will take a picture with a low resolution (for example, 72 DPI) and a large size in pixels. Therefore, having the OCR engine upscale the images will result in undesired consumption of memory. If you are using the OCR engine to process images from a mobile camera, set the value of this setting to false. |
||||||||||||||||||||||||
Recognition.Preprocess.DownSampleLargeImage |
Boolean |
N/A |
True to down sample large images prior to recognition; otherwise, false. Set the value of this setting to true to force the OCR engine to not create processing images (the image used for recognition) larger than 4000 by 4000 pixels to preserve memory and resources. This value is ignored if the value of the MobileImagePreprocess setting is true. |
||||||||||||||||||||||||
Recognition.Preprocess.UseZoningEngine |
Boolean |
N/A |
True to use the zoning engine to exclude graphics areas from preprocessing calculations such as deskew and auto-rotate. Otherwise; false. |
||||||||||||||||||||||||
Recognition.Preprocess.MinimumAutoRotateConfidence |
Integer |
0 to 100 |
Used by L_OcrPage_AutoPreprocess to determine the minimum confidence percentage threshold to use when orienting pages. Default value is 26. |
||||||||||||||||||||||||
Recognition.Preprocess.ModifyOriginalImageOptions |
Enum |
None, Deskew, Rotate, Invert |
Specifies how the original image is modified when a IOcrPage.AutoPreprocess. Default value is Deskew | Rotate | Invert.
These options are useful when saving a document with image over text option (such as the one supported by PDF). In this scenario, it maybe be preferable to overlay the original image without any modification that might affect the size. The only option that should be left in this case is Rotate. Leadtools.Forms.Ocr.IOcrAutoRecognizeManager will automatically set the value of this setting to "Rotate" if the final document format has image over text support. |
||||||||||||||||||||||||
End:Recognition.PreProcessing |
EndCategory |
N/A |
End of the pre-processing settings category. |
||||||||||||||||||||||||
Recognition.Zoning |
BeginCategory |
N/A |
Beginning of the zoning settings category. |
||||||||||||||||||||||||
Recognition.Zoning.DisableMultiThreading |
Boolean |
N/A |
True to disable multi-threading when performing auto-zoning; otherwise multi-threading is enabled. Multi-threading enhances the performance of the auto-zoning algorithm. However, it may be undesirable if the OCR engine is hosted in a server. |
||||||||||||||||||||||||
Recognition.Zoning.CropZoneImage |
Boolean |
N/A |
If this flag is set to true then the Advantage engine will crop each zone from the original image and recognize it. This can improve the performance of the L_OcrPage_Recognize method. |
||||||||||||||||||||||||
Recognition.Zoning.DetectZoneRotationAngle |
Boolean |
N/A |
If this value is set to True, then the engine will try to detect a separate rotation angle for each zone. Default value is False. |
||||||||||||||||||||||||
Recognition.Zoning.Options |
Enum |
None, Detect Text, Detect Graphics, Detect Table, Allow Overlap, Detect Accurate Zones, Use Text Extractor, Detect Checkbox |
These flags affect the way the IOcrPage.AutoZone method works. Values can be OR-ed. Possible values are:
|
||||||||||||||||||||||||
Recognition.Zoning.EnableDoubleZoning |
Boolean |
N/A |
If this flag is set to true then the Advantage engine will perform a second internal autozoning on each text zone to generate more homogenous zones for recognition. This can improve the performance of the L_OcrPage_Recognize method. |
||||||||||||||||||||||||
End:Recognition.Zoning |
EndCategory |
N/A |
End of the zoning settings category |
||||||||||||||||||||||||
Recognition.Words |
BeginCategory |
N/A |
Beginning of the word recognition settings category |
||||||||||||||||||||||||
Recognition.Words.DiscardLowConfidenceWords |
Boolean |
N/A |
This setting controls the output. If True, words/characters with a low rating (rubbish words/characters) will not be included when saving the recognition results to any of LEADTOOLS supported document formats. |
||||||||||||||||||||||||
Recognition.Words.DiscardLowConfidenceZones |
Boolean |
N/A |
This setting controls the output. If True, the engine will check all the words/characters in a zone. If it determine that the over all confidence and type of characters constitute noise, then the whole zone recognition results will be discarded. Default value is False. |
||||||||||||||||||||||||
Recognition.Words.LowWordConfidence |
Integer |
0 to 100 |
Discard any word with a confidence value less than this value. This setting only takes effect when DiscardLowConfidenceWords is set to true. |
||||||||||||||||||||||||
End:Recognition.Words |
EndCategory |
N/A |
End of the words recognition settings category. |
||||||||||||||||||||||||
Recognition.Adaption |
BeginCategory |
N/A |
Beginning of the recognition adaption settings category. |
||||||||||||||||||||||||
Recognition.Adaption.AdaptedDataFilePath |
Boolean |
N/A |
Not used in this version of LEADTOOLS |
||||||||||||||||||||||||
End:Recognition.Adaption |
EndCategory |
N/A |
End of the recognition adaption settings category. |
||||||||||||||||||||||||
Recognition.CharacterFilter |
BeginCategory |
N/A |
Beginning of the recognition character filters category. |
||||||||||||||||||||||||
Recognition.CharacterFilter.MinimumPixelWidth |
Integer |
0 to 2147483647 |
Minimum width of a recognized character in pixels. |
||||||||||||||||||||||||
Recognition.CharacterFilter.MinimumPixelHeight |
Integer |
0 to 2147483647 |
Minimum height of a recognized character in pixels. |
||||||||||||||||||||||||
Recognition.CharacterFilter.MinimumPixelSizeExcludeCharacters |
String |
No maximum and can be null |
Characters to exclude from the minimum pixel width and height rule. |
||||||||||||||||||||||||
Recognition.CharacterFilter.DiscardNoiseLikeCharacters |
Boolean |
N/A |
Ignore recognized characters that have features similar to noise. |
||||||||||||||||||||||||
Recognition.CharacterFilter.PostprocessMICR |
Boolean |
N/A |
If the value of this setting is True, then the engine will post process any MICR zones by discarding all the characters, numbers and symbols that do not belong to the MICR character set as well as performing basic checking on the validity of the data. Default value is True |
||||||||||||||||||||||||
End:Recognition.CharacterFilter |
EndCategory |
N/A |
End of the recognition character filters category. |
||||||||||||||||||||||||
Recognition.Fonts |
BeginCategory |
N/A |
Beginning of the fonts category. |
||||||||||||||||||||||||
Recognition.Fonts.EnableCapsCaps |
Boolean |
N/A |
Enable Caps/Caps font recognition enhancements. |
||||||||||||||||||||||||
Recognition.Fonts.DetectFontStyles |
Enum |
None, Bold, Italic, Underline, SansSerif, Serif, Proportional, Superscript, Subscript, Strikeout |
Enable or disable the detection of specific font properties. These flags affect the final generated document if the format supports fonts such as PDF or DOC. Values can be OR-ed. Possible values are:
|
||||||||||||||||||||||||
Recognition.Fonts.RecognizeFontAttributes |
Boolean |
N/A |
Enable font attributes recognition. Disabling it can improve the speed of the L_OcrPage_Recognize method. |
||||||||||||||||||||||||
End:Recognition.Fonts |
EndCategory |
N/A |
End of the fonts category. |
||||||||||||||||||||||||
Recognition.AutoRecognizeManager |
BeginCategory |
N/A |
Beginning of the auto-recognize manager category. |
||||||||||||||||||||||||
Recognition.AutoRecognizeManager.FormatSpeedOptimized |
Boolean |
N/A |
Enable optimizing the recognition speed based on the final document format. For example, the OCR engine will not recognize font attributes such italic or bold if the final format is Text. |
||||||||||||||||||||||||
Recognition.AutoRecognizeManager.DefaultDocumentOrientation |
Enum |
None, Portrait, Landscape |
Default orientation for the generated document if a page is blank or graphics only. Possible values are:
|
||||||||||||||||||||||||
End:Recognition.AutoRecognizeManager |
EndCategory |
N/A |
End of the auto-recognize manager category. |
||||||||||||||||||||||||
End:Recognition |
EndCategory |
N/A |
End of the recognition settings category. |
||||||||||||||||||||||||
SpellChecker |
BeginCategory |
N/A |
Beginning of the spell checker category. |
||||||||||||||||||||||||
SpellChecker.MaximumDictionaries |
Integer |
0 to 255 |
Gets or sets the maximum number of spell checkers to use at the same time. Default value is number of available dictionaries found in the system |
||||||||||||||||||||||||
End:SpellChecker |
EndCategory |
N/A |
End of the spell checker category. |