Encapsulates a multi-page document with support for raster and SVG images, bookmarks, annotations and text data.
The Document class provides uniform support for any type of document. The actual data behind can be a PDF document, Microsoft Word document, TIFF image, an AutoCAD DWG drawing or any other of the hundreds of different raster, document or vector file formats supported by LEADTOOLS. Document encapsulates the common functionality needed to access this data in a uniform manner with the same properties, methods and data structures.
Documents is used as an input to DocumentViewer which can be used to view the document and its pages with thumbnails, virtualization, text search and annotation support.
Documents can also be used as an input to DocumentConverter to convert the document to any other file format with or without using OCR technology.
A Document instance be obtained using the following:
Method | Description |
---|---|
DocumentFactory.LoadFromFile | Create a new instance from an existing document file on disk or network share. |
DocumentFactory.LoadFromUri | Create a new instance from a document stored in a remote URL. |
DocumentFactory.LoadFromUriAsync | Create a new instance asynchronously from a document stored in a remote URL or disk. |
DocumentFactory.LoadFromUriAsync | Create a new instance asynchronously from a document stored in a remote URL. |
DocumentFactory.LoadFromStream | Creates a new instance from an existing document stored in a stream. |
DocumentFactory.LoadFromCache | Loads a previously saved document from the cache. |
DocumentFactory.Create | Creates a new empty document. |
After the document is obtained, InternalObject will be to the internal LEADTOOLS object used with the document.
In most cases, the Document is ready to use after it has been obtained. However, some documents such as PDF can be encrypted and required a password before it can be parsed and used. Most of the properties and methods of Document will throw an error if the document has not been decrypted. IsEncrypted can be used to check if the document is encrypted and if so, Decrypt must be called with a password obtained from the user to unlock the document. When that happens, the value of IsDecrypted becomes true and the document is ready to be used. Note that IsEncrypted will stay true to indicate the original state of the document.
The SaveToFile and SaveToUri methods can be used to save the document to a disk file or remote URL. These methods support saving the document to a raster image format, not a document. In most cases, converting a document should be performed with more options and control using the DocumentConverter class.
Each document has a unique identifier that is set at creation time by the framework. This is stored in the DocumentId property.
The ID is important when using the document with the cache system and is the only value needed to re-construct completely the document from the cache. The document ID can be set manually by the user through the LoadDocumentOptions.DocumentId, CreateDocumentOptions.DocumentId or UploadDocumentOptions.DocumentId options used when loading, creating or uploading the document. If the value was left to null, then the factory will generate a new random ID and associated it with the document using a GUID generator.
Documents can contain large number of pages and huge amount of data. Storing all this data in the physical memory is not feasible in most situations. Therefore, the Document class was designed to use an external caching system to store the modified. Refer to DocumentFactory.Cache for more information.
HasCache determines if this document is using the cache system. SaveToCache can be used to save a document to the cache and re-loading it using DocumentFactory.LoadFromCache. AutoDeleteFromCache and AutoSaveToCache can be used to determine what happens to the cache data associated with the document when it is disposed.
DocumentStructure manages the structure of the document. This includes the bookmarks that represents the table of content. It can be accessed through the Structure property of Document.
DocumentPages manages the pages of the document. It can be accessed through the Pages property of Document.
DocumentPages derives from LeadCollection<T> and thus can implement [System.Collections.ObjectModel.Collection1](https://msdn.microsoft.com/en-us/library/System.Collections.ObjectModel.Collection
1.aspx). You can use any
of the collection methods to add, remove, insert, get, set and iterate through the pages.
DocumentPages contains a collection of DocumentPage that contains the data for a single page in the document. The page item is the main entry point for using the documents in a viewer or converter application. It contains functions to retrieve or update the raster or XVG image of the page, text data, annotations and hyperlinks. Refer to DocumentPage for more information.
DocumentDocuments manages the child documents of the document. It can be accessed theough the Documents property of Document.
DocumentDocuments derives from LeadCollection<T> and this can implement [System.Collections.ObjectModel.Collection1](https://msdn.microsoft.com/en-us/library/System.Collections.ObjectModel.Collection
1.aspx). You can use any of the collection methods
to iterate through the documents. This collection is read-only however and you cannot add, remove or change the items. Instead, use Pages to add or remove
pages that belong to a separate document to this one. The Document.Documents collection will automatically gets updated to reflect what child
documents are currently held in the document.
The metadata includes default values added by the DocumentFactory when the document is loaded or created as well as any other data extracted from the document file itself, such as author, subject and any keywords stored by other applications.
The following properties are part of Document and contains useful information:
DocumentId: The unique identifier of this document.
Name: The name of this document.
DocumentType: The document type.
MimeType: The MIME type of the document.
Uri: The URL to the original document physical location. If this is a newly created document then Uri will be null.
CacheUri: The URL to the original document's image data if it was stored in the cache.
IsDownloaded: Determines if the document was downloaded into the cache or a temp file.
IsReadOnly: Determines if the document is read-only and cannot be changed.
UserData: User-defined data associated with this document.
GetDocumentFileName: Gets the path to the file holding the original document.
GetDocumentStream: Gets a stream to the original data of the document.
GetAnnotationsFileName: Gets the path to the file holding the original annotations.
GetAnnotationsStream: Gets a stream to the original annotations.
FileLength: The length of the original document file or URL in bytes.
CacheStatus: The status of this document in the cache.
Access to the original document data depends on how the document was created and its cache status, as follows:
Result of LoadFromFile: GetDocumentFileName will contain the original the name of the file passed to the method. GetDocumentStream will return null.
Result of LoadFromUri and caching is not used: GetDocumentFileName will return the name of the temporarly file used to store the data.
Result of LoadFromUri and caching is used: if the cache supports external resources (disk-access), then GetDocumentFileName will contain the name of the file holding the cache item in disk. If the cache does not support external resource (disk-access), then GetDocumentFileName will return null and GetDocumentStream will return a valid read-only stream that can be used to access the data.
Result of LoadFromCache(ObjectCache,string): Caching is used naturally and the same rules as LoadFromUri apply.
The Document class contains the following to manage global settings used throughout the document.
Images: Manages the raster and SVG image settings of the document.
RasterCodecs: The RasterCodecs object to use when loading and saving RasterImage and SvgDocument objects.
Annotations: Manages the annotations settings of the document.
Text: Manages the text and OCR recognition settings of the document.
Barcodes: Manages the barcode settings of the document.
Document uses independent units of 1/720 of an inch for all items. This value is stored in UnitsPerInch constant (720). Refer to Documents Library Coordinate System for more information.
Document implements System.IDisposable and must be disposed after it has been used. Refer to System.IDisposable in .NET for more information. The document can be re-constructed as is after it has been disposed it is saved into the cache if AutoSaveToCache was set to true or if SaveToCache was used.
This example will load a document and shows all its information.
Loading Using LEADTOOLS Documents Library
Creating Documents with LEADTOOLS Documents Library
Documents Library Coordinate System
Loading Encrypted Files Using the Documents Library
Parsing Text with the Documents Library
Barcode processing with the Documents Library