Programming with the LEADTOOLS Document Readers

The DocumentReader class allows reading images, thumbnails, text data and metadata from any of the supported types using a uniform set of methods and properties, regardless of the document type.

The current implementation of the LEADTOOLS Document Readers support reading the following document types:

DocumentReaderType.Pdf: This is the document reader responsible for parsing PDF documents. PDF document text is parsed without the need of an OCR engine. PDF support is provided through the Leadtools.Forms.DocumentReaders.Pdf assembly.
DocumentReaderType.Xps: This is the document reader responsible for parsing XPS documents. XPS document text is parsed without the need of an OCR engine. XPS support is provided through the Leadtools.Forms.DocumentReaders.Xps assembly.
DocumentReaderType.Raster: This is the document reader responsible for parsing everything else, such as TIFF and JPEG documents. An OCR engine is required to parse the text of the document (by passing a started object of type Leadtools.Forms.Ocr.IOcrEngine to DocumentObjectManager.BeginParse. Raster support is provided through the Leadtools.Forms.DocumentReaders.Raster assembly.

LEADTOOLS will add more document readers and functionality in the near future for documents such as DICOM, DOC/DOCX(2007/2010), XLS/XLSX(2007/2010) and RTF. More objects types such as images, bookmarks, hyperlinks and annotations will also be added in the near future. Currently, support for these formats is provided by the Raster document reader (with text parsing supported by an external OCR engine).

DocumentReader is an abstract class and cannot be initialized directly. The derived classes to support PDF, XPS and the various other formats are internal to LEADTOOLS. Instead, get a DocumentReader object by using the DocumentReader.Create static (Shared in Visual Basic) method. This method will try to load the document in the supported readers and if successful, will return an instance of DocumentReader ready to use.

Once you obtain a valid instance of a DocumentReader object with a document loaded into it, you can use the following features:

Use the DocumentReader.Pages property to access the pages of the document.
The DocumentReader.MimeType property and DocumentReader.GetProperties method can be used to obtain the metadata of the document.
The methods of the DocumentReader.ImageManager property can be used get a raster image render or a thumbnail of any page in the document.
The methods of the DocumentReader.ObjectManager property can be used to parse the objects found in any page in the document such as text items and font properties.

The DocumentReader class implements the System.IDisposable interface. You must call the System.IDisposable.Dispose method when the reader is no longer used.