Creating Documents with LEADTOOLS Document Library

The Document Library can create new empty documents. Empty documents can be set in the DocumentViewer, sent to the DocumentConverter, and saved to the cache as usual. Although this is of very little use in and of itself, it is quite powerful when used as the base for a virtual document.

To illustrate, imagine a situation where you have two scanned PDF documents. One contains the odd pages and the other contains the even pages of the original scanned document. You can now create a new virtual document, add all the pages from the existing documents in the correct order, and then view this new PDF in the DocumentViewer, or send it to the DocumentConverter to finalize it.

Imagine another situation in which you want to quickly create a new legal document containing a header and table of content pages from a PDF document, two fax images (from TIFF files), four disclaimer and content pages (from Word DOCX files), and an AutoCAD drawing (in a DWG file). You want to be able to view this document in the LEADTOOLS Document Viewer. Before virtual documents, a physical file containing all those source pages must be generated (using the DocumentWriter), and cached. This operation takes time and the server must keep track of the file in order to delete it when it is no longer needed.

With virtual documents, all that is needed is to create the virtual document, load the source documents, and add the pages needed. When finished, simply send this Document object to the viewer. No further action is needed. The new document does not have a physical representation on disk. It simply redirects the calls to obtain the page images, SVG, or text to the underlying original document.

Creating new Documents

To create a new document, create an instance of CreateDocumentOptions and call DocumentFactory.Create. DocumentFactory.Create returns a new, empty, non read-only LEADDocument object. The LEADDocument.Pages collection is empty and the value of LEADDocument.IsReadOnly is set to false.

If a document already exists in the cache, call DocumentFactory.CloneDocument to create a clone of it.

Adding and Removing Pages

Assume that virtualDocument is the LEADDocument object created in the previous section and sourceDocument1 is a document obtained by calling DocumentFactory.LoadFromUri on a multipage PDF file. To add the first and second pages (page indices 0 and 1), simply call virtualDocument.Pages.AddPage(sourceDocument1.Pages[0]) and virtualDocument.Pages.AddPage(sourceDocument1.Pages[1]). Now the value of virtualDocument.Pages will contain two items.

Internally, the DocumentPage reference is shared between the two documents and no data is copied. The value of DocumentPage.Document will still point to its original owner document (sourceDocuement1). This means that sourceDocument1 must stay alive as long as the virtualDocument is alive. If you examine the virtualDocument.Documents collection, you will find it has now one item: sourceDocument1. Any changes made to the source page in the original document is reflected in the virtual document right away.

The DocumentPages collection derives from LeadCollection and allows you to not only just add, but also remove, replace, and re-order pages. If you call virtualDocument.Pages.Clear, the collection becomes empty, and if you examine the virtualDocument.Documents collection it will be empty as well. There is no link anymore between the two documents and sourceDocument1 can be disposed of, if needed.

LEADDocument.Documents is a read-only collection: meaning you cannot add or remove items from it directly. The items (of type LEADDocument) are added and removed depending on which pages are added or removed. For instance, in the case where we added two pages from the same source document, the Documents collection contains only one item since both pages are from the same document. Now if you load a multipage TIFF file into sourceDocument2 and add a page from it into virtualDocument (while it still contains the two pages from sourceDocument1), the Documents collection will now contain two items: sourceDocument1 and sourceDocument2.

You can also add empty pages to a virtual document by calling LEADDocument.Pages.CreatePage with the desired size and adding this page into virtualDocument.Pages collection. The value of DocumentPage.Document will be virtualDocument in this case since this is the original owner document.

Virtual Documents in the Viewer

DocumentViewer fully supports virtual documents. When a document is set, it will subscribe to the CollectionChanged event of the Pages collection and will update the view, thumbnail, bookmark, and annotation parts accordingly if pages are added or removed while the document is being viewed. Although the view automatically tracks all changes, it is best to call DocumentViewer.BeginUpdate/documentviewer.endupdate when adding or removing more than a handful of pages at one time in order to minimize flickering and optimize performance.

The viewer will automatically merge the bookmarks of all child documents. Bookmark items that point to non-existing pages (pages in the source document that have not been added to the virtual document) will be non-functional. Similarly, inter-links between pages are automatically checked and any that point to non-existing pages will not be functional.

Functionality that only works with certain types of documents will check the original source document type. For instance, if View as SVG is requested, then pages that belong to compatible documents (such as PDF or DOCX) will be viewed as SVG, while pages that belong to incompatible documents (such as TIFF or JPEG) will still be viewed as raster images. Similarly, when using client-side PDF rendering, only pages originally belonging to PDF documents are rendered from the original data directly using JavaScript and all others are rendered using SVG or raster images.

Virtual Documents and the Cache

To save a virtual document into the cache, call LEADDocument.SaveToCache as usual. Information about the pages (from LEADDocument.Pages) and their owner document's IDs (from LEADDocument.Documents) are stored in the cache.

When DocumentFactory.LoadFromCache is called with the ID of a virtual document, the toolkit will try to automatically load all the child documents required to reconstruct the virtual document by calling DocumentFactory.LoadFromCache with the ID of each one. If this fails for any document, then the pages that belong to it are not loaded and are removed from the virtual document.

To modify this behavior, subscribe to the DocumentFactory.LoadDocumentFromCache event.

Virtual Documents and Disposing

LEADDocument objects are disposable and the Dispose method must be called when the object is no longer needed.

There are two common scenarios of using source and virtual documents, as follows:

  1. The virtual document is the only owner of all the source documents. This is the default case when loading a virtual document from the cache: All the child documents will be loaded automatically into brand new LEADDocument objects. These objects only exist (by default) in the Documents collection of the virtual document and nowhere else in the system. In this scenario, it is best to set the value of virtualDocument.AutoDisposeDocuments to true. This way, when virtualDocument.Dispose is called, it will automatically loop through all the child documents (if any) and call Dispose as well.

  2. The source documents are used to create virtual documents on demand (maybe more than one from the same source objects), and are sent to another system. Such behavior is similar to the LEADTOOLS Virtual Document Demo which creates virtual documents from source documents on-the-fly and saves them to the cache. The same source document can be part of multiple virtual documents at the same time.

In this scenario, it is best to set the value of virtualDocument.AutoDisposeDocuments to false. When virtualDocument.Dispose is called, it will only remove the child documents from the Documents collection without calling dispose on them. Since these child (source) documents could be part of another virtual document in the system, the user should call Dispose on the source documents later when they are no longer needed.

See Also

Document Library Features

Loading Using LEADTOOLS Document Library

Document Toolkit and Caching

Uploading Using the Document Library

Document Library Coordinate system

Loading Encrypted Files Using the Document Library

Parsing Text with the Document Library

Barcode processing with the Document Library

Document Toolkit History Tracking

Document Page Transformation

Using LEADTOOLS Document Viewer

Using LEADTOOLS Document Converter

Document View and Convert Redaction

Help Version 22.0.2023.7.17
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.

LEADTOOLS Imaging, Medical, and Document
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.