The LEADTOOLS Documents library supports loading by creating a Document object from data that reside in a disk file, a remote URL, or data that was previously uploaded to the cache system.
To load a Document object from a disk file, create an instance of LoadDocumentOptions then pass it along with the file name to DocumentFactory.LoadFromFile:
var loadDocumentOptions = new LoadDocumentOptions();
// Initialize loadDocumentOptions as needed
Document document = DocumentFactory.LoadFromFile(fileName, loadDocumentOptions);
The following steps explain how this method works:
The LoadDocumentOptions.UseCache is checked. If the value is true, then the application must have set a valid cache object in either LaodDocumentOptions.Cache or DocumentFactory.Cache. Otherwise, an exception is thrown. If the value is false, then the new document will not use caching.
Caching is optional in this mode and not required. It can be used to speed up obtaining document image data or text if the pages are revisited by the application or to save the document to the cache before it is disposed. Otherwise, all the data will be parsed from the original file as needed.
If the value of LoadDocumentOptions.AnnotationsUri is not null, then it must contain the URL to a disk file as well. You can create a new Uri object from the physical path to the annotation file on disk and set it in this property. This will create an Uri object with file:///
scheme. Any other scheme (such as http) will fail when using LoadFromFile.
The factory will obtain information on the file format in fileName using RasterCodecs.GetInformation. If this fails (invalid file format or the required LEADTOOLS file format assembly is not found) then an exception is thrown.
Document object is created and the following members are initialized:
Member | Value |
---|---|
DocumentId | A unique identifier created for this document that can be used if the document is saved to the cache. |
Uri | new Uri(fileName). |
IsReadOnly | true. |
CacheUri | null since the document has direct access to the physical file. |
Stream | null. |
HasStream | false. |
IsDownloaded | false since the document was not downloaded. |
GetDocumentFileName | Will return the same fileName passed to LoadFromFile. |
GetDocumentStream | null. |
GetAnnotationsFileName | Will return the same file name passed to LoadDocumentOptions.AnnotationsUri. |
GetAnnotationsStream | null. |
HasAnnotationsStream | false. |
DocumentType | The document type. |
MimeType | The MIME type of the document file format set during load. |
HasCache | The same value as LoadDocumentOptions.UseCache. If the value is true, then GetDocumentFileName or GetDocumentStream can be used to obtain the original data. Otherwise, it will return the path to the temporary file. |
LastCacheSyncTime | Random old date since the document has not been saved to the cache yet. |
CacheStatus | DocumentCacheStatus.NotSynced since the document has not been saved to the cache yet. |
AutoDeleteFromCache | true. Can be changed to false if the application will re-load this document from the cache at a later time using DocumentFactor.LoadFromCache. |
AutoSaveToCache | false. |
InternalObject | The internal LEADTOOLS object that being used to parse the document data. |
UserData | null |
IsEncrypted | false unless the document is encrypted. In this case most of the document properties cannot be used before the document is decrypted. Refer to Loading Encrypted Files Using the Documents Library for more information. |
IsDecrypted | false |
IsStructureSupported | true or false based on the MIME type of the document. |
Metadata | Ready to be used. |
Structure | Ready to be used. |
Images | Ready to be used. |
Text | Ready to be used. |
Pages | Ready to be used. |
Documents | Empty collection since this is not a virtual document. |
HasDocuments | false. |
AutoDisposeDocuments | false. |
Annotations | Ready to be used. |
LoadFromFile returns with this Document object ready to be used.
Document will parse data from the original file on disk on demand, therefore the original fileName passed to LoadFromFile must not be deleted while Document is alive. Otherwise, errors will occur when accessing the document data.
For an example, refer to DocumentFactory.LoadFromFile.
To create a Document object from a remote URL, create an instance of LoadDocumentOptions then pass it along with a URL object pointing to the remote location of the document file to DocumentFactory.LoadFromUri:
var loadDocumentOptions = new LoadDocumentOptions();
// Initialize loadDocumentOptions as needed
Document document = DocumentFactory.LoadFromUri(uri, loadDocumentOptions);
The following steps explain how this method works:
LoadDocumentOptions.UseCache is checked. If the value is true, then the application must have set a valid cache object in either LoadDocumentOptions.Cache or DocumentFactory.Cache. Otherwise, an exception is thrown. If the value is false, then the new document will not use caching.
The cache is also optional in this mode and not required. As well as speeding obtaining document image data or text from pages that were previously visited, the cache can be used to download the document file name in uri as explained below.
If the uri passed to LoadFromUri has the special LEAD cache scheme (detected using IsUploadDocumentUri), then the factory assumes this is the URI to a document previously uploaded to the cache using DocumentFactory.BeginUpload and steps below are not performed and no data is downloaded. The data is already in the cache and the factory skips to step 8 below.
If the value of LoadDocumentOptions.AnnotationsUri is not null, then it will be treated as a remote URL and the data is downloaded by the factory in the same manner used for the document file as explained below.
The factory will check the value of LoadDocumentOptions.UseCache:
If the value is true, then the document data is downloaded from uri into the cache system.
If the value is false, then the document data is downloaded from uri to a temporary file name created on the machine.
Similarly, if LoadDocumentOptions.AnnotationsUri is not null, it will be downloaded either to the cache system or to a temporary file based on cache usage.
When downloading the data, the factory will use the WebClient object in LoadDocumentOptions if not null. Otherwise, it will create a new instance and dispose it after it has been used. This allows the application to pass a custom WebClient with specific proxy or credential settings or to monitor the download progress.
The factory will obtain information on the file format using RasterCodecs.GetInformation on the downloaded or temporary file or cache data. If this fails (invalid file format or the required LEADTOOLS file format assembly is not found), then the cache or downloaded data is deleted and an exception is thrown.
Document object is created and the following members are initialized:
Member | Value |
---|---|
DocumentId | A unique identifier created for this document that can be used if the document is saved to the cache. |
Uri | Same uri passed to LoadFromUri. |
IsReadOnly | true. |
CacheUri | If the document was downloaded to the cache and if the cache system has virtual directory capabilities, then this property will contain a URI to the original document data (PDF, TIFF, DOCX, etc.). Otherwise, is null. |
Stream | null. |
HasStream | false. |
IsDownloaded | true since the document was downloaded. |
GetDocumentFileName | Will return the path to the cache item or temporary file containing the downloaded data of the original document. If the cache does not have direct access to the file system then this will be null. |
GetDocumentStream | If the cache does not have direct access to the file system then this will return a stream containing the original document. Otherwise, null. |
GetAnnotationsFileName | Will return the path to the cache item or temporary file containing the downloaded data of the annotations. If the cache does not have direct access to the file system then this will be null |
GetAnnotationsStream | If the cache does not have direct access to the file system then this will return a stream containing the annotations. Otherwise, null. |
HasAnnotationsStream | true or false depending on the above. |
DocumentType | The document type. |
MimeType | The MIME type of the document file format set during load. |
HasCache | The same value as LoadDocumentOptions.UseCache. If the value is true, then GetDocumentFileName or GetDocumentStream can be used to obtain the original data. Otherwise, it will return the path to the temporary file. |
LastCacheSyncTime | Random old date since the document has not been saved to the cache yet. |
CacheStatus | DocumentCacheStatus.NotSynced since the document has not been saved to the cache yet. |
AutoDeleteFromCache | true. Can be changed to false if the application will re-load this document from the cache at a later time using DocumentFactor.LoadFromCache. |
AutoSaveToCache | false. |
InternalObject | The internal LEADTOOLS object that being used to parse the document data. |
UserData | null |
IsEncrypted | false unless the document is encrypted. In this case most of the document properties cannot be used before the document is decrypted. Refer to Loading Encrypted Files Using the Documents Library for more information. |
IsDecrypted | false |
IsStructureSupported | true or false based on the MIME type of the document. |
Metadata | Ready to be used. |
Structure | Ready to be used. |
Images | Ready to be used. |
Text | Ready to be used. |
Pages | Ready to be used. |
Documents | Empty collection since this is not a virtual document. |
HasDocuments | false. |
AutoDisposeDocuments | false. |
Annotations | Ready to be used. |
LoadFromUri returns with this Document object ready to be used.
Document will parse data from the downloaded data, therefore the original URL passed to LoadFromUri is never used again and the data it points to can be deleted right away.
When document is disposed, the temporary files and cache items will be deleted unless it is saved to the cached first.
For an example, refer to DocumentFactory.LoadFromUri.
LoadFromUri does not return control to the application till the document is downloaded and parsed. To create a Document object from a remote URL asynchronously, create an instance of LoadDocumentAsyncOptions and pass it along with a URL object pointing to the remote location of the document file to DocumentFactory.LoadFromUriAsync:
var loadDocumentAsyncOptions = new LoadDocumentAsyncOptions();
// Initialize loadDocumentAsyncOptions as needed. The Completed event is a must:
loadDocumentAsyncOptions.Completed += (sender, e) => {
// Completed, use e.Document
};
DocumentFactory.LoadFromUriAsync(uri, loadDocumentAsyncOptions);
The following steps explain how this method works:
LoadDocumentOptions.UseCache is checked. If the value is true, then the application must have set a valid cache object in LoadDocumentOptions.Cache or DocumentFactory.Cache. Otherwise, an exception is thrown. If the value is false, then the new document will not use caching.
The cache is also optional in this mode and not required. As well as speeding obtaining document image data or text from pages that were previously visited, the cache can be used to download the document file name in uri as explained below.
If the uri value passed to LoadFromUriAsync has the LEAD cache scheme, then the factory assumes this is the URI to a document previously uploaded to the cache using DocumentFactory.BeginUpload and steps below are not performed and no data is downloaded. The data is already in the cache and the factory skips to step 10 below.
A thread is created to handle loading the document, control is returned to the application and the rest of these steps are performed in the thread procedure.
If the value of LoadDocumentOptions.AnnotationsUri is not null, then it will be treated as a remote URL and the data is downloaded by the factory in the same manner used for the document file as explained below.
The factory will check the value of LoadDocumentOptions.UseCache:
If the value is true, then the document data is downloaded from uri into the cache system.
If the value is false, then the document data is downloaded from uri to a temporary file created on the machine.
Similarly, if LoadDocumentOptions.AnnotationsUri is not null, it will be downloaded either to the cache system or to a temporary file based on cache usage.
When downloading the data, the factory will use the WebClient object in LoadDocumentOptions if not null. Otherwise, it will create a new instance and dispose it after it has been used. This allows the application to pass a custom WebClient with specific proxy or credential settings.
The WebClient.DownloadProgressChanged event is mapped to LoadDocumentAsyncOptions.Progress if the value is not null to allow the user to monitor the progress of the download.
When WebClient.DownloadFileCompleted occurs, the factory will obtain information on the file format using RasterCodecs.GetInformation on the downloaded or temporary file. If this fails (invalid file format or the required LEADTOOLS file format assembly is not found), then the cache or downloaded data is deleted and LoadDocumentAsyncOptions.Completed is fired with the error object in LoadAsyncCompletedEventArgs.Error.
Otherwise, Document object is created and the following members are initialized:
Member | Value |
---|---|
DocumentId | A unique identifier created for this document that can be used if the document is saved to the cache. |
Uri | Same uri passed to LoadFromUriAsync. |
Stream | null. |
HasStream | false. |
IsDownloaded | true since the document was downloaded. |
GetDocumentFileName | Will return the path to the cache item or temporary file containing the downloaded data of the original document. If the cache does not have direct access to the file system then this will be null. |
GetDocumentStream | If the cache does not have direct access to the file system then this will return a stream containing the original document. Otherwise, null. |
GetAnnotationsFileName | Will return the path to the cache item or temporary file containing the downloaded data of the annotations. If the cache does not have direct access to the file system then this will be null. |
GetAnnotationsStream | If the cache does not have direct access to the file system then this will return a stream containing the annotations. Otherwise, null. |
HasAnnotationsStream | true or false depending on the above. |
DocumentType | The document type. |
MimeType | The MIME type of the document file format set during load. |
HasCache | The same value as LoadDocumentOptions.UseCache. If the value is true, then GetDocumentFileName or GetDocumentStream can be used to obtain the original data. Otherwise, it will return the path to the temporary file. |
LastCacheSyncTime | Random old date since the document has not been saved to the cache yet. |
CacheStatus | DocumentCacheStatus.NotSynced since the document has not been saved to the cache yet. |
AutoDeleteFromCache | true. Can be changed to false if the application will re-load this document from the cache at a later time using DocumentFactor.LoadFromCache. |
AutoSaveToCache | false. |
InternalObject | The internal LEADTOOLS object that being used to parse the document data. |
UserData | null |
IsEncrypted | false unless the document is encrypted. In this case most of the document properties cannot be used before the document is decrypted. Refer to Loading Encrypted Files Using the Documents Library for more information. |
IsDecrypted | false |
IsStructureSupported | true or false based on the MIME type of the document. |
Metadata | Ready to be used. |
Structure | Ready to be used. |
Images | Ready to be used. |
Text | Ready to be used. |
Pages | Ready to be used. |
Documents | Empty collection since this is not a virtual document. |
HasDocuments | false. |
AutoDisposeDocuments | false. |
Annotations | Ready to be used. |
The LoadDocumentAsyncOptions.Completed event is fired with the Document object in LoadAsyncCompletedEventArgs.Document. This Document object is now ready to be used.
Document will parse data from the downloaded data, therefore the original URL passed to LoadFromUriAsync is never used again and the data it points to can be deleted right away.
When Document is disposed, the temporary files will be deleted unless it is saved to the cached first.
For an example, refer to DocumentFactory.LoadFromUriAsync.
To create a Document object from a document stored in a stream, create an instance of LoadDocumentOptions then pass it along with the stream object to DocumentFactory.LoadFromStream:
var loadDocumentOptions = new LoadDocumentOptions();
// Initialize loadDocumentOptions as needed
Document document = DocumentFactory.LoadFromFile(stream, loadDocumentOptions);
The following steps explain how this method works:
The LoadDocumentOptions.UseCache is checked. If the value is true, then the application must have set a valid cache object in LoadDocumentOptions.Cache or DocumentFactory.Cache. Otherwise, an exception is thrown. If the value is false, then the new document will not use caching.
Caching is optional in this mode and not required. It can be used to speed up obtaining document image data or text if the pages are revisited by the application or to save the document to the cache before it is disposed. Otherwise, all the data will be parsed from the original stream as needed.
If the value of LoadDocumentOptions.AnnotationsUri is not null, then it must contain the URL to a disk file as well. You can create a new Uri object from the physical path to the annotation file on disk and set it in this property. This will create an Uri object with file:///
scheme. Any other scheme (such as http) will fail when using LoadFromStream.
The factory will obtain information on the file format in stream using RasterCodecs.GetInformation. If this fails (invalid file format or the required LEADTOOLS file format assembly is not found) then an exception is thrown.
Document object is created and the following members are initialized:
Member | Value |
---|---|
DocumentId | A unique identifier created for this document that can be used if the document is saved to the cache. |
Uri | null. |
Stream | The original stream passed to LoadFromStream. |
HasStream | true. |
IsDownloaded | false since the document was not downloaded. |
GetDocumentFileName | null. |
GetDocumentStream | null. |
GetAnnotationsFileName | Will return the same file name passed to LoadDocumentOptions.AnnotationsUri. |
GetAnnotationsStream | null. |
HasAnnotationsStream | false. |
DocumentType | The document type. |
MimeType | The MIME type of the document file format set during load. |
HasCache | The same value as LoadDocumentOptions.UseCache. |
LastCacheSyncTime | Random old date since the document has not been saved to the cache yet. |
CacheStatus | DocumentCacheStatus.NotSynced since the document has not been saved to the cache yet. |
AutoDeleteFromCache | true. Can be changed to false if the application will re-load this document from the cache at a later time using DocumentFactor.LoadFromCache. |
AutoSaveToCache | false. |
InternalObject | The internal LEADTOOLS object that being used to parse the document data. |
UserData | null |
IsEncrypted | false unless the document is encrypted. In this case most of the document properties cannot be used before the document is decrypted. Refer to Loading Encrypted Files Using the Documents Library for more information. |
IsDecrypted | false |
IsStructureSupported | true or false based on the MIME type of the document. |
Metadata | Ready to be used. |
Structure | Ready to be used. |
Images | Ready to be used. |
Text | Ready to be used. |
Pages | Ready to be used. |
Documents | Empty collection since this is not a virtual document. |
HasDocuments | false. |
AutoDisposeDocuments | false. |
Annotations | Ready to be used. |
LoadFromStream returns with this Document object ready to be used.
Document will parse data from the original stream on demand, therefore the original stream passed to LoadFromStream must be kept alive by the user while Document is alive. Otherwise, errors will occur when accessing the document data.
If the document is saved into the cache using SaveToCache, then the entire content of the stream is saved into the cache and the stream is no longer used and can be safely disposed by the user. When the document is later re-loaded from the cache using DocumentFactory.LoadFromCache then it is treated as it was downloaded from an external resource and the stream functionality is not used (the value of Stream will be null).
For an example, refer to DocumentFactory.LoadFromStream.