The document formats supported by the LEADTOOLS OCR toolkit.
Members
Value | Member | Description |
---|---|---|
0 | TextAnsi |
The output text document type is ANSI (contains 8-bit ANSI characters only). |
1 | TextUnicode |
The output text document type is UNICODE (contains 16-bit UNICODE characters). |
2 |
The target document should be PDF v1.4. PDF is generally not suited for long term preservation. The PDF format may contain resources (such as fonts) that may not exist on the viewing machine. Hence, font substitution may occur resulting in a document that may not look exactly like the original version. | |
3 | PdfA |
The target document should be PDF/A. PDF/A is a subset of PDF obtained by leaving out PDF features not suited to long-term archiving. The resulting document is 100 percent self contained where all of the information necessary for displaying the document in the same manner every time is embedded in the file. Saving with PDF/A document type may result in larger output file sizes. |
4 | PdfImageOverText |
The target document should be PDF v1.4. PDF is generally not suited for long term preservation. The PDF format may contain resources (such as fonts) that may not exist on the viewing machine. Hence font substitution may occur resulting in a document that may not look exactly like the original version. The Raster image overlies on top of the resulting PDF document. |
5 | PdfAImageOverText |
The target document should be PDF/A. PDF/A is a subset of PDF obtained by leaving out PDF features not suited to long-term archiving. The resulting document is guaranteed to look exactly like the original version when viewed on the target machine. Saving with PDF/A document type may result in larger output file sizes. The Raster image overlies on top of the resulting PDF document. |
6 | Doc |
Microsoft Word 2003 document format (DOC). |
7 | Rtf |
Microsoft Rich Text Format (RTF). |
8 | Html | HTML output. HTML 4.0 can set the exact position and size of objects. Use this output format with full formatting. |
9 | Emf |
Windows Enhanced Meta File (EMF). EMF format does not support multi-page documents. Therefore, only the last page will be used in the final document. |
10 | Docx |
Microsoft Word 2007/2010 document format (DOCX). |
11 | Pdf12 |
The target document should be PDF v1.2. PDF is generally not suited for long term preservation. The PDF format may contain resources (such as fonts) that may not exist on the viewing machine. Hence, font substitution may occur resulting in a document that may not look exactly like the original version. |
12 | Pdf12ImageOverText |
The target document should be PDF v1.2. PDF is generally not suited for long term preservation. The PDF format may contain resources (such as fonts) that may not exist on the viewing machine. Hence font substitution may occur resulting in a document that may not look exactly like the original version. The Raster image overlies on top of the resulting PDF document. |
13 | Pdf13 |
The target document should be PDF v1.3. PDF is generally not suited for long term preservation. The PDF format may contain resources (such as fonts) that may not exist on the viewing machine. Hence, font substitution may occur resulting in a document that may not look exactly like the original version. |
14 | Pdf13ImageOverText |
The target document should be PDF v1.3. PDF is generally not suited for long term preservation. The PDF format may contain resources (such as fonts) that may not exist on the viewing machine. Hence font substitution may occur resulting in a document that may not look exactly like the original version. The Raster image overlies on top of the resulting PDF document. |
15 | Pdf15 |
The target document should be PDF v1.5. PDF is generally not suited for long term preservation. The PDF format may contain resources (such as fonts) that may not exist on the viewing machine. Hence, font substitution may occur resulting in a document that may not look exactly like the original version. |
16 | Pdf15ImageOverText |
The target document should be PDF v1.5. PDF is generally not suited for long term preservation. The PDF format may contain resources (such as fonts) that may not exist on the viewing machine. Hence font substitution may occur resulting in a document that may not look exactly like the original version. The Raster image overlies on top of the resulting PDF document. |
17 | Xps |
Microsoft XML Paper Specification (XPS). |
18 | Xls |
Microsoft Excel 2003 document format (XLS). |
The Leadtools.Services.Forms.ServiceContracts.IOcrService.Recognize method allows you to save the recognized pages data to a final document format.
Some of the document formats require a special key to unlock. When using such formats you have to first unlock the specified support through the configuration files shipped with our services.
The following table lists the document formats and corresponding support types which must be unlocked in order to be used:
Document Format | Support Type |
---|---|
Pdf, PdfImageOverText, Pdf12, Pdf12ImageOverText, Pdf13, Pdf13ImageOverText, Pdf15, Pdf15ImageOverText, Xps | You need to set the value for OcrProfessionalPdfOutputKey when using the OcrEngineType.Professional engine |
PdfA | You need to set the value for OcrProfessionalPdfLeadOutputKey when using the OcrEngineType.Professional engine |
For an example, refer to DocumentConvertOptions.