Creating Documents with LEADTOOLS Documents Library

The Documents Library can create new empty documents. Empty documents can be set in the DocumentViewer, sent to the DocumentConverter and saved to the cache as usual. Although this is of very little use in and of itself, it is quite powerful when used as the base for a virtual document.

To illustrate, imagine a situation where you have two scanned PDF documents. One contains the odd pages and the other contains the even pages of the original scanned document. You can now create a new virtual document, add all the pages from the existing documents in the correct order and then view this new PDF in the DocumentViewer or send it to the DocumentConverter to finalize it.

Imagine another situation, in which you want to quickly create a new legal document containing a header and table of content pages from a PDF document, two fax images (from TIFF files), four disclaimer and content pages (from Word DOCX files) and an AutoCAD drawing (in a DWG file). You want to be able to view this document in the LEADTOOLS Document Viewer. Before virtual documents, a physical file containing all those source pages must be generated (using the DocumentWriter) and cached. This operation would take time and the server must keep track on when to delete this file when is no longer needed.

With virtual documents, all that is needed is to create the virtual document, load the source documents and add the pages needed. When finished, simply send this Document object to the viewer. No further action is needed. The new document does not have physical representation on disk. It simply redirects the calls to obtain the page images, SVG or text to the underlying original document.

Creating new Documents

To create a new document, create an instance of CreateDocumentOptions and call DocumentFactory.Create. This method will return a new empty non read-only Document object. The Document.Pages collection is empty and the value of Document.IsReadOnly is false.

Adding and Removing Pages

Assume that virtualDocument is the Document object created above and sourceDocument1 is a Document obtained through DocumentFactory.LoadFromUri on a multi-page PDF file. To add the first and second pages (page index 0 and 1) simply call virtualDocument.Pages.AddPage(sourceDocument1.Pages[0]) and virtualDocument.Pages.AddPage(sourceDocument1.Pages[1]). Now the value of virtualDocument.Pages will contain two items.

Internally, the DocumentPage reference is shared between the two documents and no data is copied. The value of DocumentPage.Document will still point to its original owner document (sourceDocuement1). This means that sourceDocument1 must stay alive as long as virtualDocument is alive. If you examine virtualDocument.Documents collection then you will find it has now one item: sourceDocument1. Any changes made to the source page in the original document is reflected in the virtual document right away.

The DocumentPages collection derives from LeadCollection and allows you to not just add, but remove, replace and re-order pages. If you call virtualDocument.Pages.Clear, the collection is now empty, and if you examine virtualDocument.Documents collection then it will be empty as well. There is no link anymore between the two documents and sourceDocument1 can be disposed if needed.

Document.Documents is a read-only collection meaning you cannot add or remove items from it directly. The items (of type Document) are added and removed depending on what pages are added or removed. For instance, in the when we added two pages from the same source document, the Documents collection contains only one item since both pages are from the same document. Now if you load a multi-page TIFF file into sourceDocument2 and add a page from this document into virtualDocument (while it still contains the two pages from sourceDocument1). The Documents collection will now contain two items: sourceDocument1 and sourceDocument2.

You can also add empty pages to a virtual document using Document.Pages.CreatePage with the desired size and adding this page into virtualDocument.Pages collection. The value of DocumentPage.Document will be virtualDocument in this case since this is the original owner document.

Virtual Documents in the Viewer

DocumentViewer has full support for virtual documents. When a document is set, it will subscribe to the CollectionChanged event of the Pages collection and will update the view, thumbnail, bookmark and annotation parts accordingly if pages are added or removed while the document is being viewed. Although the view automatically track all changes, it is recommended to call DocumentViewer.BeginUpdate/documentviewer.endupdate when adding or removing more than a handful of pages at once to minimize flickering and optimize performance.

The viewer will automatically merge the bookmarks of all child documents. Bookmark items that point to non-existing pages (pages in the source document that have not been added to the virtual document) will be non-functional. Similarly, inter links between pages are automatically checked and any that point to non-existing pages will not be functional.

Functionality that only works with certain types of documents will check the original source document type. For instance, if View as SVG is requested, then pages that belong to compatible documents (such as PDF or DOCX) will be viewed as SVG, while pages that belong to incompatible documents (such as TIFF or JPEG) will still be viewed as raster images. Similarly, when using client-side PDF rendering, only pages original belonging to PDF documents are rendered from the original data directly using JavaScript and all others are rendered using SVG or raster images.

Virtual Documents and the Cache

To save a virtual document into cache, use Document.SaveToCache as usual and information about the pages (from Document.Pages) and their owner documents IDs (from Document.Documents) are stored in the cache.

When DocumentFactory.LoadFromCache is called with the ID of a virtual document, the toolkit will try to automatically load all the child documents required to reconstruct the virtual document by calling DocumentFactory.LoadFromCache with the ID each. If this fails for any document, then the pages that belong to it are not loaded and are removed from the virtual document.

Virtual Documents and Disposing

Document objects are disposable and the Dispose method must be called when the object is no longer needed. There are two common scenarios of using source and virtual documents:

The virtual document is the only owner of all the source documents. This is the default case when loading a virtual document from the cache: All the child documents will be loaded automatically into brand new Document objects. These objects only exist (by default) in the Documents collection of the virtual document and nowhere else in the system.
The source documents are used to create virtual documents on demand, maybe more than one from the same source objects and are sent to another system. Similar to the LEADTOOLS Virtual Document Demo which creates virtual documents from source documents on the fly and saves them to the cache. The same source document can be part of multiple virtual documents at the same time.

In the first scenario, it is recommended to set the value of virtualDocument.AutoDisposeDocuments to true. This way when virtualDocument.Dispose is called, it will automatically loop through all the child documents (if any) and calls Dispose as well.

In the second scenario, it is recommended to set the value of virtualDocument.AutoDisposeDocuments to false. When virtualDocument.Dispose is called, it will only remove the child documents from the Documents collection without calling dispose on them. Since these child (source) documents could be part of another virtual document in the system. The user should call Dispose on the source documents later when they are no longer needed.

Reference

Documents Library Features

Loading Using LEADTOOLS Documents Library

Document Toolkit and Caching

Uploading Using the Documents Library

Documents Library Coordinate system

Loading Encrypted Files Using the Documents Library

Parsing Text with the Documents Library

Barcode processing with the Documents Library

Using LEADTOOLS Document Viewer

Using LEADTOOLS Document Converters

Help Version 19.0.2017.10.27