Indicates how to treat the image elements encountered in the input SVG page during text extraction.
[SerializableAttribute()]
[DataContractAttribute()]
public enum DocumentTextImagesRecognitionMode
<SerializableAttribute(),
DataContractAttribute()>
Public Enum DocumentTextImagesRecognitionMode
public:
[SerializableAttribute,
DataContractAttribute]
enum class DocumentTextImagesRecognitionMode sealed
Value | Member | Description |
---|---|---|
0 | Auto | Use SVG engine unless the page is all raster. |
1 | Disabled | Do not use OCR recognition for the image elements. Instead, ignore the image elements. |
2 | Always | Use OCR recognition on the image elements. Add the recognition data to the final document page text with the rest of the other SVG elements of the page. Requires a valid IOcrEngine instance. |
Use DocumentTextImagesRecognitionMode to specify which DocumentText.ImagesRecognitionMode type to determine how image elements are treated during text extraction from an SVG page. This value has no effect on raster pages, and OCR is always used.
SVG elements can also contain glyph (paths) that may or may not be considered images and could also be recognized using the OCR engine. This is controlled by the DocumentText.RecognizeGlyphs property.
The following table helps determine what would occur during DocumentPage.GetText, depending on the type of the page:
Value | Page Type | Behavior |
---|---|---|
Auto | SVG with only text or mixed image and text elements | Only the text elements are extracted |
Auto | SVG with raster elements only | The image elements are recognized and text extracted using the OCR engine |
Disabled | SVG with only text or mixed image and text elements | Only the text elements are extracted |
Disabled | SVG with raster elements only | No text is extracted |
Always | SVG with only text or mixed image and text elements | The text elements are extracted and the image elements are recognized and text extracted using the OCR engine |
Always | SVG with raster elements only | The image elements are recognized and text extracted using the OCR engine |
The engine will use DocumentPage.IsSvgSupported and DocumentPage.IsSvgConversionPreferred, as well as checking the SVG of the page elements (returned by DocumentPage.GetSvg) to perform the actions described above.
When Always is used, a valid (started) IOcrEngine instance must be set in DocumentText.OcrEngine.
When Auto is used, a valid (started) IOcrEngine instance should be set in DocumentText.OcrEngine. If this value is null, then the framework will behave as if Disabled were used.
Note: When using the OcrEngineType.LEAD engine, DocumentPage.GetText will try to optimize the speed of OCR recognition for text format output (for instance, it will not try to recognize font decorations such as bold or italic). This is done by checking if Recognition.AutoRecognizeManager.FormatSpeedOptimized
is true (the default value). This optimization can result in DocumentPage.GetText producing slightly different recognition on complex input raster images than IOcrPage.GetText, which does not use the value of the setting. Therefore, if producing the same exact results from the two methods is important, set the value of the Recognition.AutoRecognizeManager.FormatSpeedOptimized
setting to false in the IOcrEngine used with the document. Refer to LEADTOOLS OCR Module - LEAD Engine Settings for more information.
Help Collections
Raster .NET | C API | C++ Class Library | HTML5 JavaScript
Document .NET | C API | C++ Class Library | HTML5 JavaScript
Medical .NET | C API | C++ Class Library | HTML5 JavaScript
Medical Web Viewer .NET
Multimedia
Direct Show .NET | C API | Filters
Media Foundation .NET | C API | Transforms
Supported Platforms
.NET, Java, Android, and iOS/macOS Assemblies
Imaging, Medical, and Document
C API/C++ Class Libraries
Imaging, Medical, and Document
HTML5 JavaScript Libraries
Imaging, Medical, and Document