Visual Basic (Declaration) | |
---|---|
<ObsoleteAttribute("Use Leadtools.Forms.DocumentWriters.DocumentFormat instead")> <SerializableAttribute()> Public Enum OcrDocumentFormat Inherits Enum |
Visual Basic (Usage) | Copy Code |
---|---|
|
C# | |
---|---|
[ObsoleteAttribute("Use Leadtools.Forms.DocumentWriters.DocumentFormat instead")] [SerializableAttribute()] public enum OcrDocumentFormat : Enum |
C++/CLI | |
---|---|
[ObsoleteAttribute("Use Leadtools.Forms.DocumentWriters.DocumentFormat instead")] [SerializableAttribute()] public enum class OcrDocumentFormat : public Enum |
Member | Description |
---|---|
AsciiText | ASCII Text. This is the most basic format and the document will be a text file with line break after each line. If table is present, its cells are positioned by tabs. The text
returned by RecognizeText uses this format. Note: Use DocumentFormat.Text instead. |
AsciiTextLayoutRetained | ASCII Text output, layout retention with mimicked spaces. Line/cell contents are surrounded by quotes (""). Note: Use DocumentFormat.Text instead. |
AsciiTextCommaDelimited | ASCII Comma delimited text output. Line/cell contents are surrounded by quotes (""). Note: Use DocumentFormat.Text instead. |
AsciiTextFormatted | ASCII Text output allowing quick text conversion. Line break after each line and after each zone. Note: Use DocumentFormat.Text instead. |
UnicodeText | UNICODE Text with line break after each line. If a table is present, its cells are positioned by tabs. Note: Use DocumentFormat.Text instead. |
UnicodeTextLayoutRetained | UNICODE Text output, layout retention with mimicked spaces. Line/cell contents are surrounded by quotes (""). Note: Use DocumentFormat.Text instead. |
UnicodeTextCommaDelimited | UNICODE Text with line break after each line. If table is present, its cells are positioned by tabs. Note: Use DocumentFormat.Text instead. |
UnicodeTextFormatted | UNICODE Text output allowing quick text conversion. Line break after each line and after each zone. Note: Use DocumentFormat.Text instead. |
Html32 | HTML output. HTML 3.2 is useful to export with partial formating. The output files support all major browsers. Note: Use DocumentFormat.Html instead. |
Html40 | HTML output.HTML 4.0 can set the exact position/size of objects, use this output format with full formatting. Note: Use DocumentFormat.Html instead. |
Word97 | Microsoft Word 97 (doc) output format. Note: Use DocumentFormat.Doc instead. |
Word2000 | Microsoft Word 2000 (doc) output format. Note: Use DocumentFormat.Doc instead. |
Word2003 | Microsoft Word 2003 (doc) output format. Note: Use DocumentFormat.Doc instead. |
WordML | Microsoft Office Open XML (docx) output format. Note: The LEADTOOLS Document Writers does not currently support an equivalent to this format. |
Excel97 | Microsoft Excel 97 (xls) output format. Note: The LEADTOOLS Document Writers does not currently support an equivalent to this format. |
Excel2000 | Microsoft Excel 2000 (xls) output format. Note: The LEADTOOLS Document Writers does not currently support an equivalent to this format. |
Rtf | Rich Text Format for Word 97 and later. Note: Use DocumentFormat.Rtf instead. |
RtfWordPad | Rich Text Format for Microsoft WordPad. Note: Use DocumentFormat.Rtf instead. |
InfoPath | Microsoft InfoPath XML document output format. Note: The LEADTOOLS Document Writers does not currently support an equivalent to this format. |
Adobe PDF. Displaying the generated PDF file in a PDF-reader results in a very similar look to the original document. The text can be searched. The PDF file contains the
recognized characters in the same positions as in the original. The original page image is overlaid on top of the PDF document. Note: Use DocumentFormat.Pdf instead. | |
PdfImage | Adobe PDF with raster image only. Note: Use DocumentFormat.Pdf instead. |
PdfText | Adobe PDF with text only. The text can be searched. The PDF file contains the recognized characters in the same positions as in the original. The original page image is not
overlayed ontop of the PDF document. Note: Use DocumentFormat.Pdf instead. |
PdfEdited | Adobe PDF with text and image. Use this format if you have used IOcrPage.SetRecognizedCharacters to insert or delete characters
in the recognized data. The engine will re-arrange the character boxes before saving the result PDF file. Note: Use DocumentFormat.Pdf instead. |
PdfWithImageSubstitutes | Adobe PDF with text only. Missing and rejected characters are replaced by small images from the original page resulting in a better looking document than
PdfText. The text can be searched. The PDF file contains the recognized characters in the same positions as in the original. Note: Use DocumentFormat.Pdf instead. |
PdfA | Adobe PDF/A format. The original page image is overlaid on top of the PDF document. Optimized for the long-term archiving of electronic documents and is based on the PDF
Reference Version 1.4 from Adobe Systems Inc. (implemented in Adobe Acrobat 5). Note: Use DocumentFormat.Pdf instead. |
PdfAText | Adobe PDF/A format with text only. Optimized for the long-term archiving of electronic documents and is based on the PDF Reference Version 1.4 from Adobe Systems Inc.
(implemented in Adobe Acrobat 5). Note: Use DocumentFormat.Pdf instead. |
(Deprecated) All formats supported by Leadtools.Forms.DocumentWriters can be used from OCR now. For a list of the formats supported by LEADTOOLS OCR, refer to DocumentFormat. To get the engine native formats (if any), use GetEngineSupportedFormats.
The IOcrDocument interface contains the IOcrDocument.Save methods which allow you to save the recognized pages data to a final document format such as PDF, DOC and HTML (or XML through IOcrDocument.SaveXml).
Not all of the formats are supported by an IOcrEngine. To get the formats supported by a particular engine, use the IOcrDocumentManager.GetSupportedFormats or IOcrDocumentManager.IsFormatSupported methods.
To get the file extension for a OcrDocumentFormat, use IOcrDocumentManager.GetFormatFileExtension.
To get the friendly name of a OcrDocumentFormat, use IOcrDocumentManager.GetFormatFriendlyName.
Some of the document formats requires a special key to unlock. When using these formats you have to first unlock the specified support using the RasterSupport class.
The following table lists the document formats and the support type required to be unlocked before using them:
Document Format | Support Type |
---|---|
Pdf, PdfImage, PdfText, PdfEdited and PdfWithImageSubstitutes | RasterSupportType.OcrPlusPdfOutput when using the OcrEngineType.Plus engine, RasterSupportType.OcrProfessionalPdfOutput when using the OcrEngineType.Professional engine and RasterSupportType.OcrAdvantagePdfLeadOutput when using the OcrEngineType.Advantage engine |
PdfA and PdfAText | RasterSupportType.OcrPlusPdfLeadOutput when using the OcrEngineType.Plus engine, RasterSupportType.OcrProfessionalPdfLeadOutput when using the OcrEngineType.Professional engine and RasterSupportType.OcrAdvantagePdfLeadOutput when using the OcrEngineType.Advantage engine |
System.Object
System.ValueType
System.Enum
Leadtools.Forms.Ocr.OcrDocumentFormat
Target Platforms: Microsoft .NET Framework 3.0, Windows XP, Windows Server 2003 family, Windows Server 2008 family
Reference
Leadtools.Forms.Ocr NamespaceIOcrDocumentManager Interface
IOcrDocument Interface
IOcrDocument.Save
IOcrDocument.SaveXml
IOcrEngine Interface
OcrEngineManager Class
OcrEngineType Enumeration
Programming with Leadtools .NET OCR
Files to be Included with Your Application
Recognizing OCR Pages
Unlocking Special LEAD Features