Contains the object parsing functionality such as text associated with a DocumentReader.
Instances of the DocumentObjectManager aren't created directly. Instead, access the instance that is automatically created inside a DocumentReader using the DocumentReader.ObjectManager property.
The DocumentObjectManager contains the ParsePageText method that parses the text of the document. Use the ParsePageText in a loop to parse the text of all the pages in a document. However, you must call BeginParse before parsing starts and EndParse when parsing is finished.
The various document readers will parse text differently. Currently, LEADTOOLS ships with the following document readers:
DocumentReaderType.Pdf: This the document reader responsible for parsing PDF documents. PDF document text is parsed without the need of an OCR engine.
DocumentReaderType.Xps: This the document reader responsible for parsing XPS documents. XPS document text is parsed without the need of an OCR engine.
DocumentReaderType.Raster: This the document reader responsible for parsing everything else. An OCR engine is required to parse the text of the document (by passing a started object of type Leadtools.Forms.Ocr.IOcrEngine to BeginParse.
LEADTOOLS will add more document readers and functionality in the near future for document types such as DICOM, DOC/DOCX(2007/2010), XLS/XLSX(2007/2010) and RTF.
More objects types such as images, bookmarks, hyperlinks and annotations will also be added in the near future.
For an example, refer to DocumentReader.
Target Platforms: Windows 7, Windows Vista SP1 or later, Windows XP SP3, Windows Server 2008 (Server Core not supported), Windows Server 2008 R2 (Server Core supported with SP1 or later), Windows Server 2003 SP2
Products |
Support |
Feedback: DocumentObjectManager Class - Leadtools.Forms.DocumentReaders |
Introduction |
Help Version 19.0.2017.3.22
|