Using LEADTOOLS Document Converter

The Document Converter allows conversion from any type of document to another with a minimal amount of code.

Both input and output document types can be any file format supported by LEADTOOLS, which includes but is not limited to:

The DocumentConverter class will analyze the input and output document types and then automatically use a combination of the LEADTOOLS Raster, SVG, and OCR engines to convert the data using the best possible combination of accuracy and speed. Each conversion operation is called a Document Converter Job in the framework.

Input Document

The DocumentConverter uses the LEADTOOLS Document Library to obtain information about the input file. The LEADDocument class encapsulates the file format details and returns a uniform set of the functionality needed for reading the pages and parsing the data needed for the conversion job. This includes loading page data as RasterImage or SvgDocument objects, reading the table of contents and internal page links and any annotation objects embedded in the file or stored in an associated file.

Output Document

There are two types of output file formats:

Conversion Options

The document conversion is designed to run unattended. However, the DocumentConverter provides many options to monitor and modify the operation and to customize the output document as needed. This includes the following options:

Starting Up: DocumentConverter class

The DocumentConverter class is the main entry to the framework. Initialize an instance of this class to be used for converting one or more documents and then set the following options:

Member Description
SetOcrEngineInstance IOcrEngine to use for parsing text and objects when SVG is not available in the input document.
SetDocumentWriterInstance DocumentWriter to use when creating the output file when document format output is selected.
SetAnnRenderingEngineInstance Optional rendering engine to use when the annotations are overlaid on top of images.
LoadDocumentOptions Options to use when loading the input document.
Preprocessor The pre-processing options to use for cleaning up the images of the input document.
Options (Optional) Extra options to use during the conversion such as error recovery mode and page number template.
Diagnostics Options for logging such as enabling standard .NET tracing.

Creating Jobs

Once the DocumentConverter class is initialized, use the DocumentConverterJobs class (accessed through the DocumentConverter.Jobs property) to create new conversion jobs.

The parameters for a job are set in a DocumentConverterJobData structure. This contains the following members:

Member Description
Document The LEADDocument object to be used as the input of the conversion. Either this or InputDocumentFileName is used.
InputDocumentFileName The path to the input file for the conversion. Either this or Document are used.
InputAnnotationsFileName (Optional) The path to the file containing the annotations file to be added to the output document.
InputDocumentFirstPageNumber (Optional) The number of the first page to be converted from the input document.
InputDocumentLastPageNumber (Optional) The number of the last page to be converted from the input document.
DocumentFormat The output format when document conversion is used.
RasterImageFormat The output format when raster conversion is used.
RasterImageBitsPerPixel The bits per pixel of the output file when raster conversion is used.
OutputDocumentFileName The name of the output file to be generated by this conversion.
OutputAnnotationsFileName (Optional) The name of the file that will contain the annotations parsed from the input document.
AnnotationsMode Customizes how the annotations are saved in the output document.
JobName (Optional) The name of this job. Useful when tracing is enabled.
UserData (Optional) The user-defined object that can be used alongside the job events to pass application-specified data.

The DocumentConverterJobs.CreateJobData overloaded methods can also be used to quickly create jobs from common input and output options.

When all the options are set, the DocumentConverterJobs.CreateJob method is used to create an instance of the DocumentConverterJob class that holds the job options as well as its status. This object will then be passed to DocumentConverterJobs.RunJob or DocumentConverterJobs.RunJobAsync to run the operation.

Running Jobs

The DocumentConverterJobs.RunJob or DocumentConverterJobs.RunJobAsync methods are used to run the job from the data created in the previous section. While the job is running, the DocumentConverterJobs.JobStarted (once), DocumentConverterJobs.JobOperation (more than once), and the DocumentConverterJobs.JobCompleted (once) events will fire to indicate the job progress.

The data for the DocumentConverterJobEventArgs type events contains all the necessary information about the current job and its status:

Member Description
Job The actual job object that was passed to RunJob or RunJobAsync.
Status The current status of the job and whether it is still running or has been aborted. Use this property to abort any running jobs.
Operation The current operation being performed by the converter.
IsPostOperation A value that indicates whether this event is being fired before or after Operation.
InputDocumentPageNumber The current page number in the input document.
OutputDocumentPageNumber The current page number in the output document.
Document The LEADDocument object being used by this conversion.
DocumentWriter The DocumentWriter object being used by this operation if document conversion is used.
OcrDocument The OCR document object being used if this operation is using OCR conversion.
OcrPage The OCR page object being used if this operation is using OCR conversion.
SvgDocument The SVG document being used if this operation is using SVG conversion.
OcrPageImage The raster image object for the current page if this operation is using OCR conversion.
RasterImage The raster image being used if this operation is using raster conversion.
AnnContainer The annotation container being used if annotation conversion is used.
AnnotationsMode The current annotations conversion mode.

For more information about these members and how they can be used or modified, refer to DocumentConverterJobOperation.

The InputDocumentPageNumber property can be used to show a progress bar indicator of the current conversion operation.

Completing Jobs

The job is completed when the RunJob method returns. If RunJobAsync was used, then the JobCompleted should be used to indicate when the job is completed. In both cases, the DocumentConverterJob object passed will contain information about the status of this operation as follows:

Member Description
Status The job status. This can be Success, SuccessWithErrors or Aborted.
Errors A list of any errors that might have occurred during the conversion.
JobData The original options used to create this job.
DocumentConverter The document converter object used to run the job.

Multi-Threading

The DocumentConverter is multi-threaded safe. The RunJobAsync method can be used to run multiple jobs at the same time and run them in separate threads. Internally, the converter uses the .NET Thread Pool exclusively for creating and managing threads.

The RunJobAsync will perform a sanity check on the options and then start the job and return control back to the user immediately. The JobOperation and JobCompleted events can be used to monitor the job's status and to make notifications when a job is completed. AbortAllJobs can be used at any time to abort all running and cancel any pending jobs.

Status Document Job Converter

DocumentConverter has support for document conversion with status update. Refer to Status Document Job Converter for more information.

Help Version 21.0.2021.11.1
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2021 LEAD Technologies, Inc. All Rights Reserved.

LEADTOOLS Imaging, Medical, and Document
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2021 LEAD Technologies, Inc. All Rights Reserved.