The LEADTOOLS Document library supports loading by creating a LEADDocument object. The data used can reside in a disk file, a remote URL, or data that was previously uploaded to the cache system.
To load a LEADDocument object from a disk file, create an instance of LoadDocumentOptions, and then pass it along with the file name to DocumentFactory.LoadFromFile:
var loadDocumentOptions = new LoadDocumentOptions();
// Initialize loadDocumentOptions as needed
Document document = DocumentFactory.LoadFromFile(fileName, loadDocumentOptions);
The following steps explain how this method works:
The LoadDocumentOptions.UseCache is checked. If the value is true, then the application must have set a valid cache object in either LaodDocumentOptions.Cache or DocumentFactory.Cache. Otherwise, an exception is thrown. If the value is false, then the new document will not use caching.
Caching is optional in this mode and not required. It can be used to speed up obtaining document image data or text if the pages are revisited by the application,or to save the document to the cache before it is disposed of. Otherwise, all the data will be parsed from the original file as needed.
If the value of LoadDocumentOptions.AnnotationsUri is not null, then it must contain the URL to a disk file as well. You can create a new Uri object from the physical path to the annotation file on disk and set it in this property. This will create a Uri object using the file:///
naming scheme. Any other scheme (such as http) will fail when using LoadFromFile.
The factory will obtain information about the file format in fileName using RasterCodecs.GetInformation. If this fails (if it is an invalid file format or the required LEADTOOLS file format assembly is not found), then an exception is thrown.
A LEADDocument object is created and the following members are initialized:
Member | Value |
---|---|
DocumentId | A unique identifier created for this document that can be used if the document is saved to the cache. |
Uri | new Uri(fileName). |
IsReadOnly | true. |
CacheUri | null since the document has direct access to the physical file. |
Stream | null. |
HasStream | false. |
IsDownloaded | false since the document was not downloaded. |
GetDocumentFileName | Will return the same fileName passed to LoadFromFile. |
GetDocumentStream | null. |
GetAnnotationsFileName | Will return the same file name passed to LoadDocumentOptions.AnnotationsUri. |
GetAnnotationsStream | null. |
HasAnnotationsStream | false. |
DocumentType | The document type. |
MimeType | The MIME type of the document file format set during load. |
HasCache | The same value as LoadDocumentOptions.UseCache. If the value is true, then GetDocumentFileName or GetDocumentStream can be used to obtain the original data. Otherwise, it will return the path to the temporary file. |
LastCacheSyncTime | Random old date since the document has not yet been saved to the cache. |
CacheStatus | DocumentCacheStatus.NotSynced since the document has not yet been saved to the cache. |
AutoDeleteFromCache | true. Can be changed to false if the application will re-load this document from the cache at a later time using DocumentFactor.LoadFromCache. |
AutoSaveToCache | false. |
InternalObject | The internal LEADTOOLS object that being used to parse the document data. |
UserData | null |
IsEncrypted | false unless the document is encrypted. In this case most of the document properties cannot be used before the document is decrypted. Refer to Loading Encrypted Files Using the Document Library for more information. |
IsDecrypted | false |
IsStructureSupported | true or false based on the MIME type of the document. |
Metadata | Ready to be used. |
Structure | Ready to be used. |
Images | Ready to be used. |
Text | Ready to be used. |
Pages | Ready to be used. |
Documents | Empty collection since this is not a virtual document. |
HasDocuments | false. |
AutoDisposeDocuments | false. |
Annotations | Ready to be used. |
LoadFromFile returns with this LEADDocument object, ready to be used.
LEADDocument will parse data from the original file on disk on demand, therefore the original fileName passed to LoadFromFile must not be deleted while the Document is alive. Otherwise, errors will occur when accessing the document data.
For an example, refer to DocumentFactory.LoadFromFile.
To create a LEADDocument object from a remote URL, create an instance of LoadDocumentOptions, and then pass it along with a URL object pointing to the remote location of the document file to DocumentFactory.LoadFromUri:
var loadDocumentOptions = new LoadDocumentOptions();
// Initialize loadDocumentOptions as needed
Document document = DocumentFactory.LoadFromUri(uri, loadDocumentOptions);
The following steps explain how this method works:
LoadDocumentOptions.UseCache is checked. If the value is true, then the application must have set a valid cache object in either LoadDocumentOptions.Cache or DocumentFactory.Cache. Otherwise, an exception is thrown. If the value is false, then the new document will not use caching.
The cache is also optional in this mode and not required. As well as speeding obtaining document image data or text from pages that were previously visited, the cache can be used to download the document file name in uri as explained below.
If the uri passed to LoadFromUri has the special LEAD cache scheme (detected using IsUploadDocumentUri), then the factory assumes this is the URI to a document previously uploaded to the cache using DocumentFactory.BeginUpload, and the steps below are not performed and no data is downloaded. The data is already in the cache and the factory skips to step 8 below.
If the value of LoadDocumentOptions.AnnotationsUri is not null, then it will be treated as a remote URL and the data is downloaded by the factory in the same manner used for the document file as explained below.
The factory will check the value of LoadDocumentOptions.UseCache:
If the value is true, then the document data is downloaded from uri into the cache system.
If the value is false, then the document data is downloaded from uri to a temporary file name created on the machine.
Similarly, if LoadDocumentOptions.AnnotationsUri is not null, it will be downloaded either to the cache system or to a temporary file based on cache usage.
When downloading the data, the factory will use the WebClient object in LoadDocumentOptions if not null. Otherwise, it will create a new instance and dispose of it after it has been used. This allows the application to pass a custom WebClient with specific proxy or credential settings or to monitor the download progress.
The factory will obtain information about the file format using RasterCodecs.GetInformation on the downloaded or temporary file or cache data. If this fails (if it is an invalid file format or the required LEADTOOLS file format assembly is not found), then the cache or downloaded data is deleted and an exception is thrown.
A LEADDocument object is created and the following members are initialized:
Member | Value |
---|---|
DocumentId | A unique identifier created for this document that can be used if the document is saved to the cache. |
Uri | Same uri passed to LoadFromUri. |
IsReadOnly | true. |
CacheUri | If the document was downloaded to the cache and if the cache system has virtual directory capabilities, then this property will contain a URI to the original document data (PDF, TIFF, DOCX, etc.). Otherwise, is null. |
Stream | null. |
HasStream | false. |
IsDownloaded | true since the document was downloaded. |
GetDocumentFileName | Will return the path to the cache item or temporary file containing the downloaded data of the original document. If the cache does not have direct access to the file system then this will be null. |
GetDocumentStream | If the cache does not have direct access to the file system then this will return a stream containing the original document. Otherwise, null. |
GetAnnotationsFileName | Will return the path to the cache item or temporary file containing the downloaded data of the annotations. If the cache does not have direct access to the file system then this will be null |
GetAnnotationsStream | If the cache does not have direct access to the file system then this will return a stream containing the annotations. Otherwise, null. |
HasAnnotationsStream | true or false depending on the above. |
DocumentType | The document type. |
MimeType | The MIME type of the document file format set during load. |
HasCache | The same value as LoadDocumentOptions.UseCache. If the value is true, then GetDocumentFileName or GetDocumentStream can be used to obtain the original data. Otherwise, it will return the path to the temporary file. |
LastCacheSyncTime | Random old date since the document has not yet been saved to the cache. |
CacheStatus | DocumentCacheStatus.NotSynced since the document has not yet been saved to the cache. |
AutoDeleteFromCache | true. Can be changed to false if the application will re-load this document from the cache at a later time using DocumentFactor.LoadFromCache. |
AutoSaveToCache | false. |
InternalObject | The internal LEADTOOLS object that being used to parse the document data. |
UserData | null |
IsEncrypted | false unless the document is encrypted. In this case most of the document properties cannot be used before the document is decrypted. Refer to Loading Encrypted Files Using the Document Library for more information. |
IsDecrypted | false |
IsStructureSupported | true or false based on the MIME type of the document. |
Metadata | Ready to be used. |
Structure | Ready to be used. |
Images | Ready to be used. |
Text | Ready to be used. |
Pages | Ready to be used. |
Documents | Empty collection since this is not a virtual document. |
HasDocuments | false. |
AutoDisposeDocuments | false. |
Annotations | Ready to be used. |
LoadFromUri returns with this LEADDocument object ready to be used.
The document will parse data from the downloaded data, therefore the original URL passed to LoadFromUri is never used again and the data it points to can be deleted right away.
When the document is disposed of, the temporary files and cache items will be deleted unless they are saved to the cache first.
For an example, refer to DocumentFactory.LoadFromUri.
LoadFromUri does not return control to the application until the document is downloaded and parsed. To create a LEADDocument object from a remote URL asynchronously, create an instance of LoadDocumentAsyncOptions. Pass LoadDocumentAsyncOptions, along with a URL object pointing to the remote location of the document file to DocumentFactory.LoadFromUriAsync:
var loadDocumentAsyncOptions = new LoadDocumentAsyncOptions();
// Initialize loadDocumentAsyncOptions as needed. The Completed event is a must:
loadDocumentAsyncOptions.Completed += (sender, e) => {
// Completed, use e.Document
};
DocumentFactory.LoadFromUriAsync(uri, loadDocumentAsyncOptions);
The following steps explain how this method works:
LoadDocumentOptions.UseCache is checked. If the value is true, then the application must have set a valid cache object in LoadDocumentOptions.Cache or DocumentFactory.Cache. Otherwise, an exception is thrown. If the value is false, then the new document will not use caching.
The cache is also optional in this mode and not required. Caching speeds up obtaining document image data or text from pages that were previously visited, and can also be used to download the document file name in uri as explained below.
If the uri value passed to LoadFromUriAsync is using the LEAD caching scheme, then the factory assumes this is the URI to a document previously uploaded to the cache using DocumentFactory.BeginUpload. The steps below are not performed and no data is downloaded. The data is already in the cache and the factory skips to step 10 below.
A thread is created to handle loading the document, control is returned to the application, and the rest of these steps are performed in the thread procedure.
If the value of LoadDocumentOptions.AnnotationsUri is not null, then it will be treated as a remote URL and the data is downloaded by the factory in the same manner used for the document file as explained below.
The factory will check the value of LoadDocumentOptions.UseCache:
If the value is true, then the document data is downloaded from uri into the cache system.
If the value is false, then the document data is downloaded from uri to a temporary file created on the machine.
Similarly, if LoadDocumentOptions.AnnotationsUri is not null, it will be downloaded either to the cache system or to a temporary file based on cache usage.
When downloading the data, the factory will use the WebClient object in LoadDocumentOptions if not null. Otherwise, it will create a new instance and dispose of it after it has been used. This allows the application to pass a custom WebClient with specific proxy or credential settings.
The WebClient.DownloadProgressChanged event is mapped to LoadDocumentAsyncOptions.Progress if the value is not null to allow the user to monitor the progress of the download.
When WebClient.DownloadFileCompleted occurs, the factory will obtain information about the file format using RasterCodecs.GetInformation on the downloaded or temporary file. If this fails (if it is an invalid file format or the required LEADTOOLS file format assembly is not found), then the cache or downloaded data is deleted and LoadDocumentAsyncOptions.Completed is fired with the error object in LoadAsyncCompletedEventArgs.Error.
Otherwise, LEADDocument object is created and the following members are initialized:
Member | Value |
---|---|
DocumentId | A unique identifier created for this document that can be used if the document is saved to the cache. |
Uri | Same uri passed to LoadFromUriAsync. |
Stream | null. |
HasStream | false. |
IsDownloaded | true since the document was downloaded. |
GetDocumentFileName | Will return the path to the cache item or temporary file containing the downloaded data of the original document. If the cache does not have direct access to the file system then this value will be null. |
GetDocumentStream | If the cache does not have direct access to the file system then this will return a stream containing the original document. Otherwise, null. |
GetAnnotationsFileName | Will return the path to the cache item or temporary file containing the downloaded data of the annotations. If the cache does not have direct access to the file system then this will be null. |
GetAnnotationsStream | If the cache does not have direct access to the file system then this will return a stream containing the annotations. Otherwise, null. |
HasAnnotationsStream | true or false depending on the above. |
DocumentType | The document type. |
MimeType | The MIME type of the document file format set during load. |
HasCache | The same value as LoadDocumentOptions.UseCache. If the value is true, then GetDocumentFileName or GetDocumentStream can be used to obtain the original data. Otherwise, it will return the path to the temporary file. |
LastCacheSyncTime | Random old date since the document has not yet been saved to the cache. |
CacheStatus | DocumentCacheStatus.NotSynced since the document has not yet been saved to the cache. |
AutoDeleteFromCache | true. Can be changed to false if the application will re-load this document from the cache at a later time using DocumentFactor.LoadFromCache. |
AutoSaveToCache | false. |
InternalObject | The internal LEADTOOLS object that being used to parse the document data. |
UserData | null |
IsEncrypted | false unless the document is encrypted. In this case most of the document properties cannot be used before the document is decrypted. Refer to Loading Encrypted Files Using the Document Library for more information. |
IsDecrypted | false |
IsStructureSupported | true or false based on the MIME type of the document. |
Metadata | Ready to be used. |
Structure | Ready to be used. |
Images | Ready to be used. |
Text | Ready to be used. |
Pages | Ready to be used. |
Documents | Empty collection since this is not a virtual document. |
HasDocuments | false. |
AutoDisposeDocuments | false. |
Annotations | Ready to be used. |
The LoadDocumentAsyncOptions.Completed event is fired with the LEADDocument object in LoadAsyncCompletedEventArgs.Document. This LEADDocument object is now ready to be used.
LEADDocument will parse data from the downloaded data, therefore the original URL passed to LoadFromUriAsync is never used again and the data it points to can be deleted right away.
When LEADDocument is disposed of, the temporary files will be deleted unless it is saved to the cache first.
For an example, refer to DocumentFactory.LoadFromUriAsync.
To create a LEADDocument object from a document stored in a stream, create an instance of LoadDocumentOptions, and then pass it along with the stream object to DocumentFactory.LoadFromStream:
var loadDocumentOptions = new LoadDocumentOptions();
// Initialize loadDocumentOptions as needed
Document document = DocumentFactory.LoadFromFile(stream, loadDocumentOptions);
The following steps explain how this method works:
The LoadDocumentOptions.UseCache is checked. If the value is true, then the application must have already set a valid cache object in LoadDocumentOptions.Cache or DocumentFactory.Cache. Otherwise, an exception is thrown. If the value is false, then the new document will not use caching.
Caching is optional in this mode and not required. It can be used to speed up obtaining document image data or text if the pages are revisited by the application or to save the document to the cache before it is disposed. Otherwise, all the data will be parsed from the original stream as needed.
If the value of LoadDocumentOptions.AnnotationsUri is not null, then it must contain the URL to a disk file as well. You can create a new Uri object from the physical path to the annotation file on disk and set it in this property. This will create an Uri object with file:///
scheme. Any other scheme (such as http) will fail when using LoadFromStream.
The factory will obtain information on the file format in stream using RasterCodecs.GetInformation. If this fails (if it is an invalid file format or the required LEADTOOLS file format assembly is not found) then an exception is thrown.
LEADDocument object is created and the following members are initialized:
Member | Value |
---|---|
DocumentId | A unique identifier created for this document that can be used if the document is saved to the cache. |
Uri | null. |
Stream | The original stream passed to LoadFromStream. |
HasStream | true. |
IsDownloaded | false since the document was not downloaded. |
GetDocumentFileName | null. |
GetDocumentStream | null. |
GetAnnotationsFileName | Will return the same file name passed to LoadDocumentOptions.AnnotationsUri. |
GetAnnotationsStream | null. |
HasAnnotationsStream | false. |
DocumentType | The document type. |
MimeType | The MIME type of the document file format. The value is set during load. |
HasCache | The same value as LoadDocumentOptions.UseCache. |
LastCacheSyncTime | Random old date since the document has not yet been saved to the cache. |
CacheStatus | DocumentCacheStatus.NotSynced since the document has not yet been saved to the cache. |
AutoDeleteFromCache | true. Can be changed to false if the application will re-load this document from the cache at a later time using DocumentFactor.LoadFromCache. |
AutoSaveToCache | false. |
InternalObject | The internal LEADTOOLS object being used to parse the document data. |
UserData | null |
IsEncrypted | false unless the document is encrypted. In the document is encrypted, most of the document properties cannot be used before the document is decrypted. Refer to Loading Encrypted Files Using the Document Library for more information. |
IsDecrypted | false |
IsStructureSupported | true or false based on the MIME type of the document. |
Metadata | Ready to be used. |
Structure | Ready to be used. |
Images | Ready to be used. |
Text | Ready to be used. |
Pages | Ready to be used. |
Documents | Empty collection since this is not a virtual document. |
HasDocuments | false. |
AutoDisposeDocuments | false. |
Annotations | Ready to be used. |
LoadFromStream returns with this LEADDocument object ready to be used.
LEADDocument will parse data from the original stream on demand, therefore the original stream passed to LoadFromStream must be kept alive by the user while Document is alive. Otherwise, errors will occur when accessing the document data.
If the document is saved into the cache using SaveToCache, then the entire content of the stream is saved into the cache and the stream is no longer used and can be safely disposed by the user. When the document is later re-loaded from the cache using DocumentFactory.LoadFromCache then it is treated as it was downloaded from an external resource and the stream functionality is not used (the value of Stream will be null).
For an example, refer to DocumentFactory.LoadFromStream.
Complex document file formats such as DOCX and XSLX can require significantly more time to parse the file structure than simpler document file formats. The amount of time depends on the source file itself. Very complex document files (such as a very large XLSX spreadsheet with thousands or millions of rows), can take many seconds or even minutes. DocumentFactory.LoadFromFile or DocumentFactory.LoadFromUri will not return until all the file data is parsed.
For such documents, using TimeoutMilliseconds allows long-loading operations to be aborted if required. After the allocated timeout has passed, DocumentFactory will abort the load operation and return null instead of a valid LEADDocument is returned from LoadFromFile or LoadFromUri.
Certain multipage file formats can be slow to load using the default implementation of DocumentFactory. This is especially true for document file formats such as DOCX/DOC, XLSX/XLS, RTF, PDF, and TXT.
Refer to DocumentMemoryCache for more information on how to speed up loading these types of files, especially in a client-server application.
If MIME type whitelisting is used, it is possible for the DocumentFactory load methods to return null as the resulting document if its MIME type was denied. Refer to DocumentMimeTypes for more information.
The following methods allow the user to create a clone (an exact copy) of a document stored in the cache:
The following methods can be used to quickly obtain information about a document without loading it. Information obtained includes the document name, mime type, and number of pages:
Documents are automatically deleted when they expire as setup using the cache policies. The following method can be used to manually delete a document from the cache at any time:
Each LEADDocument can optionally be associated with a user token to restrict usage. For instance, when a document is first loaded from a URI into the cache using DocumentFactory.LoadFromUri, the value of LoadDocumentOptions.UserToken is checked and if it is not null, will be used as the user token associated with this document. Subsequent calls to DocumentFactory.LoadFromCache will fail if the value of LoadFromCacheOptions.UserToken does not match.
Similarly, a user token can be associated when a document is created from scratch using DocumentFactory.Create through CreateDocumentOptions.UserToken and when a document is uploaded to the cache using (DocumentFactory.BeginUpload through UploadDocumentOptions.UserToken. Attempts to then load these documents with DocumentFactory.LoadFromUri or DocumentFactory.LoadFromCache will fail if the same user token is not passed accordingly. The same behavior also occurs during DocumentFactory.DeleteFromCache, DocumentFactory.DownloadDocument and DocumentFactory.DownloadAnnotations
When using DocumentFactory.GetDocumentCacheInfo to obtain information about a document in the cache, the value of DocumentCacheInfo.HasUserToken will indicate if the document in the cache contains a user token and cannot be loaded or deleted if the correct user token is not used.
Refer to DocumentFactory.InvalidUserTokenException for more information on how to control the way a document fails to load when an invalid user token is used.
Uploading Using the Document Library
Document Library Coordinate System
Loading Encrypted Files Using the Document Library
Parsing Text with the Document Library
Barcode Processing with the Document Library
Document Toolkit History Tracking
Using LEADTOOLS Document Viewer