Loading Documents Using LEADTOOLS Document Library

Summary

The Document library supports loading by creating a LEADDocument object from data that resides in a disk file or a remote URL or data that was previously uploaded to the cache system.

This topic discusses the loading the information associated with representing a document as a LEADDocument object. For a discussion of loading individual page images or thumbnails for a DocumentPage, see Image Loading.

Loading From a Remote URL

To load a LEADDocument object from a remote URL call LoadFromUri, passing the URL pointing to the remote location of the document file, along with an instance (which can be null) of LoadDocumentOptions. Refer to LoadFromUri for an example.

The method works as follows:

  1. If the uri passed to LoadFromUri has the special LEAD cache scheme (detected using IsUploadDocumentUri), then the factory assumes this is the URI to a document previously uploaded to the cache using LoadFromUri and the steps below are not performed and no data is downloaded. The data is already in the cache and the factory skips to step 4 below.

  2. If the value of AnnotationsUri is not null, then it will be treated as a remote URL and the data is downloaded by the factory in the same manner used for the document file as explained below.

  3. The factory will download the document data from the uri into the cache system.

  4. If AnnotationsUri is not null, it will be downloaded to the cache system.

  5. The factory will obtain information about the file format of the downloaded or temporary file. If this fails (an invalid file format or the required LEADTOOLS file format assembly is not found), then the cache data is deleted and an exception is thrown.

  6. A LEADDocument object is created and the following members are initialized:

    Member Value
    DocumentId A unique identifier created for this document that can be used if the document is saved to the cache.
    Uri Same uri passed to loadFromUri.
    IsReadOnly true.
    CacheUri If the document was downloaded to the cache and if the cache system has virtual directory capabilities, then this property will contain a URI to the original document data (PDF, TIFF, DOCX, etc.). Otherwise, it is null.
    MimeType The MIME type of the document file format set during load.
    LastCacheSyncTime Random old date since the document has not been saved to the cache yet.
    CacheStatus DocumentCacheStatus.NotSynced since the document has not been saved to the cache yet.
    IsEncrypted] false unless the document is encrypted. In this case most of the document properties cannot be used before the document is decrypted. Refer to [Loading Encrypted Files Using the Document Library for more information.
    IsDecrypted false.
    IsStructureSupported true or false, based on the MIME type of the document.
    Metadata Ready to be used.
    Structure Ready to be used.
    Images Ready to be used.
    Text Ready to be used.
    Pages Ready to be used.
    Documents Empty collection since this is not a virtual document.
    HasDocuments false.
    AutoDisposeDocuments false.
    Annotations Ready to be used.
  7. LoadFromUri returns with this LEADDocument object ready to be used.

  8. LEADDocument parses data from the downloaded data. Therefore, the original URL is never used again and the data it points to can be deleted right away if necessary.

Refer to LoadFromUri for an example.

LoadFromFile can be used to load a document stored in a JavaScript File object. The factory will first upload the document data to the service before calling LoadFromUri on the resulting URI.

Aborting Long Loading Operations

Complex document file formats such as DOCX and XSLX can require significantly more time to parse the file structure than simpler file formats. The amount of time depends on the source file itself, and for very complex document files such as a very large XLSX spreadsheet with thousands or millions of rows, the parsing time can be seconds or even minutes. LoadFromUri will not return until all the file data has been parsed.

For complex documents, use the TimeoutMilliseconds property to abort long loading operations, if required. After the allocated timeout has passed, the service will abort the load operation and return null instead of a valid LEADDocument from LoadFromUri.

Loading From the Cache

As stated above, a valid uri to a LEADDocument in the cache scheme can be retrieved from LoadFromUri without issue. However, LoadFromCache also exists as a subset of LoadFromUri's functionality for instances in which a Document has already been loaded once but must be loaded again. LoadFromCache does not take a LoadDocumentOptions object.

LoadFromCache uses the DocumentId property to download the Document data, so that value may be stored instead of the Uri when holding on to cache items for later.

Refer to LoadFromUri for an example.

Document User Tokens

Each LEADDocument can optionally be associated with a user token to restrict usage. For instance, suppose that when a document is first loaded from a URI into the cache using DocumentFactory.loadFromUri, a user-token header is added to the request and its value set to the desired token. If this value is set and is not null, it will be used as the user token associated with this document. Subsequent calls to DocumentFactory.loadFromCache will fail if the value of user-token header does not match.

Similarly, a user token can be associated when a document is created from scratch using DocumentFactory.create and when a document is uploaded to the cache using (DocumentFactory.beginUpload or DocumentFactory.beginUploadDocument. Attempts to then load these documents with DocumentFactory.loadFromUri or DocumentFactory.loadFromCache will fail if the same user token is not passed accordingly. The same behavior also occurs during DocumentFactory.deleteFromCache, DocumentFactory.downloadDocumentData or DocumentFactory.downloadDocumentData.

When using DocumentFactory.checkCacheInfo to obtain information about a document in the cache, the value of CacheInfo.hasUserToken will indicate if the document in the cache contains a user token, in which case it cannot be loaded or deleted unless the correct user token is used.

Refer to the Document Viewer Demo for an example on how to use a user token.

See Also

Document Library Features

Document Viewer Application

Creating Documents with LEADTOOLS Document Library

Document Toolkit and Caching

Uploading Using the Document Library

Document Library Coordinate System

Loading Encrypted Files Using the Document Library

Parsing Text with the Document Library

Barcode processing with the Document Library

Using jQuery Promises in the Document Library

Loading Images in the Document Library

Document Page Transformation

Using LEADTOOLS Document Viewer

Status Document Job Converter

Document View and Convert Redaction

Help Version 22.0.2023.1.18
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.

LEADTOOLS HTML5 JavaScript
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.