Provides the main class for reading documents.
The DocumentReader class allows reading images, thumbnails, text data and metadata from any of the supported types using a uniform set of methods and properties, regardless of the document type.
The current implementation of the LEADTOOLS Document Readers support reading the following document types:
DocumentReaderType.Pdf: This is the document reader responsible for parsing PDF documents. PDF document text is parsed without the need of an OCR engine. PDF support is provided through the Leadtools.Forms.DocumentReaders.Pdf assembly.
DocumentReaderType.Xps: This is the document reader responsible for parsing XPS documents. XPS document text is parsed without the need of an OCR engine. XPS support is provided through the Leadtools.Forms.DocumentReaders.Xps assembly.
DocumentReaderType.Raster: This is the document reader responsible for parsing everything else, such as TIFF and JPEG documents. An OCR engine is required to parse the text of the document (by passing a started object of type Leadtools.Forms.Ocr.IOcrEngine to BeginParse. Raster support is provided through the Leadtools.Forms.DocumentReaders.Raster assembly.
LEADTOOLS will add more document readers and functionality in the near future for documents such as DICOM, DOC/DOCX(2007/2010), XLS/XLSX(2007/2010) and RTF. More objects types such as images, bookmarks, hyperlinks and annotations will also be added in the near future. Currently, support for these formats is provided by the Raster document reader (with text parsing supported by an external OCR engine).
DocumentReader is an abstract class and cannot be initialized directly. The derived classes to support PDF, XPS and the various other formats are internal to LEADTOOLS. Instead, get a DocumentReader object by using the DocumentReader.Create static (Shared in VB) method. This method will try to load the document in the supported readers and if successful, will return an instance of DocumentReader ready to use.
Once you obtain a valid instance of a DocumentReader object with a document loaded into it, you can use the following features:
Use the Pages property to access the pages of the document.
The MimeType property and GetProperties method can be used to obtain the metadata of the document.
The methods of the ImageManager property can be used get a raster image render or a thumbnail of any page in the document.
The methods of the ObjectManager property can be used to parse the objects found in any page in the document such as text items and font properties.
The DocumentReader class implements the System.IDisposable interface. You must call the System.IDisposable.Dispose method when the reader is no longer used.
Target Platforms: Windows 7, Windows Vista SP1 or later, Windows XP SP3, Windows Server 2008 (Server Core not supported), Windows Server 2008 R2 (Server Core supported with SP1 or later), Windows Server 2003 SP2