Gets the object to use for parsing the document pages objects such as text items.
A DocumentObjectManager object that can be used to parse the document pages objects such as the text items. The default value is null (Nothing in VB).
Instances of the DocumentObjectManagerare not created directly. Instead access the instance automatically created inside a DocumentReader using the ObjectManager property.
The DocumentObjectManager contains the ParsePageText method that allows you to parse the text of the document. You can use ParsePageText to parse the text of all the pages in a document by using it in a loop. However, you must call BeginParse before parsing starts and EndParse when parsing is done.
The various document readers will parse text differently, currently, LEADTOOLS ships with the following document readers:
DocumentReaderType.Pdf: This the document reader responsible for parsing PDF documents. The PDF documents text is parsed without the need of an OCR engine
DocumentReaderType.Xps: This the document reader responsible for parsing XPS documents. The XPS documents text is parsed without the need of an OCR engine
DocumentReaderType.Raster: This the document reader responsible for parsing everything else, such as TIFF and JPEG documents. An OCR engine is required to parse the text of the document (by passing a started object of type Leadtools.Forms.Ocr.IOcrEngine to BeginParse
LEADTOOLS will add more document readers and functionality in the near future for document such as DICOM, DOC/DOCX(2007/2010), XLS/XLSX(2007/2010) and RTF.
More objects types such as images, bookmarks, hyperlinks and annotations will also be added in the near future.
For an example, refer to DocumentObjectManager.
Target Platforms: Windows 7, Windows Vista SP1 or later, Windows XP SP3, Windows Server 2008 (Server Core not supported), Windows Server 2008 R2 (Server Core supported with SP1 or later), Windows Server 2003 SP2