Creating Documents with the LEADTOOLS Document Library

The Document Library can create new empty documents. Empty documents can be set in the DocumentViewer, sent to the Document Converter, and saved to the cache just like other documents. Although this is of very little use in and of itself, it is quite powerful when used as the basis for a virtual document.

To illustrate, imagine a situation where you have two scanned PDF documents. One contains the odd pages and the other contains the even pages of the original scanned document. You can now create a new virtual document, add all the pages from the existing documents in the correct order, and then view this new PDF in the DocumentViewer, or send it to the Document Converter to finalize it.

Imagine another situation, in which you want to quickly create a new legal document containing a header and table of content pages from a PDF document, two fax images (from TIFF files), four disclaimer and content pages (from Word DOCX files) and an AutoCAD drawing (in a DWG file). You want to be able to view this document in the LEADTOOLS Document Viewer. Before virtual documents existed, all of these source pages would have to be physically joined together in one file (using the LEADTOOLS Document Writer), and cached. Such an operation would take time and the server would have to keep track of when to delete this file when is no longer needed.

With virtual documents, all that is needed is to create the virtual document, load the source documents and add the pages needed. When finished, simply send this Document object to the viewer. No further action is needed. The new document does not have any kind of physical representation on disk. It simply redirects the calls to obtain the page images, SVG, or text to the underlying original document.

Creating new Documents

To create a new document, create an instance of CreateDocumentOptions and call DocumentFactory.Create. DocumentFactory.Create will return a new, empty, non-read-only LEADDocumentobject. The Document.Pages collection is empty and the value of Document.IsReadOnly will be false.

Adding and Removing Pages

Assume that virtualDocument is the LEADDocument object created above, and sourceDocument1 is a Document obtained by calling DocumentFactory.LoadFromUri on a multipage PDF file. To add the first and second pages (page indices 0 and 1), simply call virtualDocument.Pages.AddPage(sourceDocument1.Pages[0]) and virtualDocument.Pages.AddPage(sourceDocument1.Pages[1]). Now the value of virtualDocument.Pages will contain two items.

Internally, the DocumentPage reference is shared between the two documents and no data is copied. The value of DocumentPage.Document will still point to its original owner document (sourceDocuement1). This means that sourceDocument1 must stay alive as long as virtualDocument is alive. If you examine the virtualDocument.Documents collection, you will find it has now one item: sourceDocument1. Any changes made to the source page in the original document is reflected in the virtual document right away.

The DocumentPages collection derives from LeadCollection and allows you to not only add, but also remove, replace, and re-order pages. If you call virtualDocument.Pages.Clear, the collection is now empty, and if you examine the virtualDocument.Documents collection it will be empty as well. There is no link anymore between the two documents and sourceDocument1 can be disposed of if needed.

LEADDocument.Documents is a read-only collection, meaning you cannot add or remove items from it directly. The items (of type LEADDocument) are added and removed depending on which pages are added or removed. For instance, if two pages are added from the same source document, the Documents collection contains only one item since both pages are from the same document. Now suppose you load a multipage TIFF file into sourceDocument2 and add a page from this document into virtualDocument (while it still contains the two pages from sourceDocument1). The Documents collection will now contain two items: sourceDocument1 and sourceDocument2.

You can also add empty pages to a virtual document using Document.Pages.CreatePage with the desired size, and add these pages into virtualDocument.Pages collection. The value of DocumentPage.Document will be virtualDocument in this case since this is the original owner document.

Virtual Documents in the Viewer

The DocumentViewer fully supports virtual documents. When a document is set, it will subscribe to the CollectionChanged event of the Pages collection, and will update the view, thumbnail, bookmark, and annotation parts accordingly if pages are added or removed while the document is being viewed. Although the view automatically tracks all changes, it is best to call DocumentViewer.BeginUpdate/documentviewer.endupdate when adding or removing more than a handful of pages at one time in order to minimize flickering and optimize performance.

The viewer will automatically merge the bookmarks of all child documents. Bookmark items that point to non-existing pages (pages in the source document that have not been added to the virtual document) will be non-functional. Similarly, inter-links between pages are automatically checked and any that point to non-existing pages will not be functional.

Functionality that only works with certain types of documents will check the original source document type. For instance, if View as SVG is requested, then pages that belong to compatible documents (such as PDF or DOCX) will be viewed as SVG, while pages that belong to incompatible documents (such as TIFF or JPEG) will still be viewed as raster images. Similarly, when using client-side PDF rendering, only pages originally belonging to PDF documents are rendered from the original data directly using JavaScript, and all others are rendered using SVG or raster images.

Virtual Documents and the Cache

To save a virtual document into a cache, use DocumentFactory.SaveToCache as usual and provide information about the pages (from Document.Pages) and their owner document IDs (from LEADDocument.Documents) that are stored in the cache.

When DocumentFactory.LoadFromCache is called with the ID of a virtual document, the toolkit will try to automatically load all child documents required to reconstruct the virtual document by calling DocumentFactory.LoadFromCache with the IDs of each child document. If this fails for any document, then the pages that belong to it are not loaded and instead are removed from the virtual document.

Virtual Documents and Disposing

LEADDocumentobjects are disposable and the Dispose method must be called when the object is no longer needed. There are two common scenarios of using source and virtual documents:

The virtual document is the only owner of all the source documents. This is the default case when loading a virtual document from the cache: All the child documents will be loaded automatically into brand new LEADDocument objects. These objects only exist (by default) in the Documents collection of the virtual document, and nowhere else in the system.
The source documents are used to create virtual documents on demand, maybe more than one from the same source objects, and are sent to another system. This is similar to the LEADTOOLS Virtual Document Demo which creates virtual documents from source documents on-the-fly and saves them to the cache. The same source document can be part of multiple virtual documents at the same time.

In the first scenario, it is best to set the value of virtualDocument.AutoDisposeDocuments to true. This way, when virtualDocument.Dispose is called it will automatically loop through all the child documents (if any), and call Dispose as well.

In the second scenario, it is best to set the value of virtualDocument.AutoDisposeDocuments to false. When virtualDocument.Dispose is called, it will only remove the child documents from the Documents collection without calling dispose on them, since these child (source) documents could be part of another virtual document in the system. Call Dispose on the source documents later when they are no longer needed.