Products | Support | Email a link to this topic. | Send comments on this topic. | Back to Introduction | Help Version 19.0.6.22
|
The Document Converters allows conversion from any type of document to another with minimal amount of code.
The input and output document types can be any type of file formats supported by LEADTOOLS. Including but not limited to:
Adobe Acrobat PDF and PDF/A
Microsoft Office DOCX/DOC, XLSX/XLS and PPTX/PPT
CAD formats such as DXF, DWG and DWF
TIFF, JPEG, PNG, EXIF, BMP and hundred more raster image formats
AFP, MODCA and PTOCA
The DocumentConverter class will analyze the input and output documents types and then automatically uses a combination of the LEADTOOLS Raster, SVG and OCR engines to convert the data using the best possible combination of accuracy and speed. Each conversion operation is called a Document Converter Job in the framework.
DocumentConverter uses the LEADTOOLS Documents Library to obtain information on the input file. The Document class encapsulates the file format details and returns a uniform set of the functionality needed for reading the pages and parsing the data needed for the conversion job. This includes loading page data as RasterImage or SvgDocument objects, reading the table of content and internal page links and any annotation objects embedded in file or stored in an associated file.
The output document file format is divided into two categories:
Document File Formats. The generated output file will have all the text, images, shapes and any other objects found in the input document converted as is. Example of these documents are searchable PDF and Microsoft Word DOCX files.
In this mode, the converter engine will use SvgDocument or IOcrEngine technologies to parse the text and objects from the input document regardless of its type. For example, SVG is used if the input file is also a document format (such as PDF) and OCR is used if the input file is a raster format (such as TIFF).
The main object used for creating these types documents is DocumentWriter. Before running any conversion operations, you must set a new DocumentWriter instance in the converter using SetDocumentWriterInstance. You can get this value at any time using DocumentWriterInstance. Use this value to setup the extra document format options needed using DocumentWriter.SetOptions.
Raster File Formats. The generated output file will only have the raster image representation of the text, images, shapes and any other objects found in the input document. Example of these documents are TIFF, JPEG, PNG or raster PDF files.
In this mode, the converter engine will not use SVG nor OCR, instead relies on extensive file format support provided by LEADTOOLS to load the pages of the input document as raster and save them directly into the output file.
The main object used for creating these types documents is RasterCodecs.
The document conversion is designed to run unattended. However, the DocumentConverter provides many options to monitor and modify the operation and to customize the output document as needed. This includes:
Built in multi-threading support
Diagnostics and logging through standard .NET tracing
Extensive events to report job status and progress as well as to allow modification of the data on the fly
Pre-processing to clean up the images loaded from the input documents
Annotations support for both input and output documents
Error recovery and quarantine
Page numbering template
The DocumentConverter class is the main entry to the framework. Initialize an instance of this class to be used for converting one or more documents and then set these options:
> > >Member | Description |
---|---|
SetOcrEngineInstance | IOcrEngine to use for parsing text and objects when SVG is not available in the input document. |
SetDocumentWriterInstance | DocumentWriter to use when creating the output file when document format output is selected. |
SetAnnRenderingEngineInstance | Optional rendering engine to use when the annotations are overlaid on top of images. |
LoadDocumentOptions | Options to use when loading the input document. |
Preprocessor | The pre-processing options to use for cleaning up the images of the input document. |
Options | Extra optional options to use during the conversion such as error recovery mode and page number template. |
Diagnostics | Options for logging such as enabling standard .NET tracing. |
Once the DocumentConverter class is initialized, use the DocumentConverterJobs class (accessed through DocumentConverter.Jobs property) to create new conversion jobs.
The parameters for a job are set in a DocumentConverterJobData structure. This contains the following members:
Member | Description |
---|---|
Document | Document object to be used as the input of the conversion. Either this or InputDocumentFileName are used. |
InputDocumentFileName | Path to the input file for the conversion. Either this or Document are used. |
InputAnnotationsFileName | Path to the file containing the annotations file to be added to the output document. Optional. |
InputDocumentFirstPageNumber | The number of the first page to be converted from the input document. Optional. |
InputDocumentLastPageNumber | The number of the last page to be converted from the input document. Optional. |
DocumentFormat | The output format when document conversion is used. |
RasterImageFormat | The output format when raster conversion is used. |
RasterImageBitsPerPixel | The bits per pixel of the output file when raster conversion is used. |
OutputDocumentFileName | Name of the output file to be generated by this conversion. |
OutputAnnotationsFileName | Name of the file that will contain the annotations parsed from the input document. Optional. |
AnnotationsMode | Customizes how the annotations are saved in the output document. |
JobName | Optional name of this job. Useful when tracing is enabled. |
UserData | Optional user-defined object that can be used a long side the job events to pass application specified data. |
The DocumentConverterJobs.CreateJobData overloaded methods can also be used to quickly create jobs from common input and output options.
When all the options are set, the DocumentConverterJobs.CreateJob method is used to create an instance of the DocumentConverterJob class that holds the job options as well the its status. This object will then passed to DocumentConverterJobs.RunJob or DocumentConverterJobs.RunJobAsync to run the operation.
DocumentConverterJobs.RunJob or DocumentConverterJobs.RunJobAsync are used to run the job from the data created in the previous section. While the job is running, the DocumentConverterJobs.JobStarted (once), DocumentConverterJobs.JobOperation (more than one) and DocumentConverterJobs.JobCompleted (once) events will fire to indicate the job progress.
The data for the events of type DocumentConverterJobEventArgs and contains all the necessary information on the current job and its status:
Member | Description |
---|---|
Job | The actual job object that was passed to RunJob or RunJobAsync. |
Status | The current status of the job and whether it is still running or has been aborted. The user can abort any running jobs by modifying this property. |
Operation | Current operation being performed by the converter. |
IsPostOperation | Whether this event is being fired before or after Operation. |
InputDocumentPageNumber | Current page number in the input document. |
OutputDocumentPageNumber | Current page number in the output document. |
Document | The Document object being used by this conversion. |
DocumentWriter | The DocumentWriter object being used by this operation if document conversion is used. |
OcrDocument | The OCR document object being used if this operation is using OCR conversion. |
OcrPage | The OCR page object being used if this operation is using OCR conversion. |
SvgDocument | The SVG document being used if this operation is using SVG conversion. |
OcrPageImage | The raster image object for the current page if this operation is using OCR conversion. |
RasterImage | The raster image being used if this operation is using raster conversion. |
AnnContainer | Annotation container being used if annotation conversion is used. |
AnnotationsMode | Current annotations conversion mode. |
For more information on these members and how they can be used or modified, refer to DocumentConverterJobOperation.
The InputDocumentPageNumber property can be used to show a progress bar indicator of the current conversion operation.
The job is completed when the RunJob method returns. If RunJobAsync was used, then the JobCompleted should be used to indicate when the job is completed. In both case, the DocumentConverterJob object passed will contain information on the status of this operation as follows:
Member | Description |
---|---|
Status | The job status. This can be success, success but with errors or aborted. |
Errors | A list of any errors that might have occurred during the conversion. |
JobData | The original options used to create this job. |
DocumentConverter | The document converter object used to run the job. |
DocumentConverter is multi-threaded safe. The RunJobAsync method can be used to run multiple jobs at the same time and run them in separate threads. Internally, the converter uses the .NET Thread Pool exclusively for creating and managing threads.
RunJobAsync will perform sanity check on the options and then start the job and return control to user immediately. The , JobOperation and JobCompleted events can be used to monitor the jobs status and to be notified when a job is completed. AbortAllJobs can be used at any time to abort all running and cancel any pending jobs.