The Documents Library can create new empty documents. Empty documents can be set in the DocumentViewer, sent to the Document Converter and saved to the cache just like other documents. Although this is of very little use in and of itself, it is quite powerful when used as the base for a virtual document.
To illustrate, imagine a situation where you have two scanned PDF documents. One contains the odd pages and the other contains the even pages of the original scanned document. You can now create a new virtual document, add all the pages from the existing documents in the correct order and then view this new PDF in the DocumentViewer or send it to the Document Converter to finalize it.
Imagine another situation, in which you want to quickly create a new legal document containing a header and table of content pages from a PDF document, two fax images (from TIFF files), four disclaimer and content pages (from Word DOCX files) and an AutoCAD drawing (in a DWG file). You want to be able to view this document in the LEADTOOLS Document Viewer. Before virtual documents, all of these source pages would have to be physically joined together in one file (using the LEADTOOLS Document Writer) and cached. This operation would take time and the server would have to keep track of when to delete this file when is no longer needed.
With virtual documents, all that is needed is to create the virtual document, load the source documents and add the pages needed. When finished, simply send this Document object to the viewer. No further action is needed. The new document does not have physical representation on disk. It simply redirects the calls to obtain the page images, SVG or text to the underlying original document.
To create a new document, create an instance of CreateDocumentOptions and call DocumentFactory.Create. This method will return a new empty non-read-only Document object. The Document.Pages collection is empty and the value of Document.IsReadOnly is false.
Assume that virtualDocument
is the Document object created above and sourceDocument1
is a
Document obtained through DocumentFactory.LoadFromUri on a multi-page PDF file.
To add the first and second pages (page index 0 and 1), simply call
virtualDocument.Pages.AddPage(sourceDocument1.Pages[0])
and virtualDocument.Pages.AddPage(sourceDocument1.Pages[1])
.
Now the value of virtualDocument.Pages
will contain two items.
Internally, the DocumentPage reference is shared between the two documents and no data is
copied. The value of DocumentPage.Document will still point to its original owner
document (sourceDocuement1
). This means that sourceDocument1
must stay alive as long as virtualDocument
is
alive. If you examine the virtualDocument.Documents
collection then you will find it has now one item:
sourceDocument1
. Any changes made to the source page in the original document is reflected in the virtual document
right away.
The DocumentPages collection derives from LeadCollection and
allows you to not just add, but remove, replace and re-order pages. If you call virtualDocument.Pages.Clear
, the
collection is now empty, and if you examine the virtualDocument.Documents
collection it will be empty as well. There
is no link anymore between the two documents and sourceDocument1
can be disposed of if needed.
Document.Documents is a read-only collection, meaning you cannot add or remove items
from it directly. The items (of type Document) are added and removed depending on which pages are
added or removed. For instance, if two pages are added from the same source document, the Documents collection
contains only one item since both pages are from the same document. Now suppose you load a multi-page TIFF file into
sourceDocument2
and add a page from this document into virtualDocument
(while it still contains the two pages
from sourceDocument1
). The Documents collection will now contain two items: sourceDocument1
and sourceDocument2
.
You can also add empty pages to a virtual document using Document.Pages.CreatePage
with the desired size and adding this page into virtualDocument.Pages
collection. The value of
DocumentPage.Document will be virtualDocument
in this case since this is the
original owner document.
The DocumentViewer has complete support for virtual documents. When a document is set, it will subscribe to the CollectionChanged event of the Pages collection and will update the view, thumbnail, bookmark and annotation parts accordingly if pages are added or removed while the document is being viewed. Although the view automatically tracks all changes, it is best to call DocumentViewer.BeginUpdate/documentviewer.endupdate when adding or removing more than a handful of pages at one time in order to minimize flickering and optimize performance.
The viewer will automatically merge the bookmarks of all child documents. Bookmark items that point to non-existing pages (pages in the source document that have not been added to the virtual document) will be non-functional. Similarly, inter-links between pages are automatically checked and any that point to non-existing pages will not be functional.
Functionality that only works with certain types of documents will check the original source document type. For instance, if View as SVG is requested, then pages that belong to compatible documents (such as PDF or DOCX) will be viewed as SVG, while pages that belong to incompatible documents (such as TIFF or JPEG) will still be viewed as raster images. Similarly, when using client-side PDF rendering, only pages originally belonging to PDF documents are rendered from the original data directly using JavaScript and all others are rendered using SVG or raster images.
To save a virtual document into a cache, use DocumentFactory.SaveToCache as usual and provide information about the pages (from Document.Pages) and their owner documents IDs (from Document.Documents) that are stored in the cache.
When DocumentFactory.LoadFromCache is called with the ID of a virtual document, the toolkit will try to automatically load all child documents required to reconstruct the virtual document by calling DocumentFactory.LoadFromCache with the ID each. If this fails for any document, then the pages that belong to it are not loaded and are removed from the virtual document.
Document objects are disposable and the Dispose method must be called when the object is no longer needed. There are two common scenarios of using source and virtual documents:
The virtual document is the only owner of all the source documents. This is the default case when loading a virtual document from the cache: All the child documents will be loaded automatically into brand new Document objects. These objects only exist (by default) in the Documents collection of the virtual document and nowhere else in the system.
The source documents are used to create virtual documents on demand, maybe more than one from the same source objects and are sent to another system. This is similar to the LEADTOOLS Virtual Document Demo which creates virtual documents from source documents on-the-fly and saves them to the cache. The same source document can be part of multiple virtual documents at the same time.
In the first scenario, it is best to set the value of virtualDocument.AutoDisposeDocuments to true. This way, when virtualDocument.Dispose is called it will automatically loop through all the child documents (if any) and call Dispose as well.
In the second scenario, it is best to set the value of virtualDocument.AutoDisposeDocuments to false. When virtualDocument.Dispose is called, it will only remove the child documents from the Documents collection without calling dispose on them, since these child (source) documents could be part of another virtual document in the system. Call Dispose on the source documents later when they are no longer needed.
Loading Documents Using the LEADTOOLS Documents Library
Uploading Using the Documents Library
Documents Library Coordinate System
Loading Encrypted Files Using the Documents Library
Parsing Text with the Documents Library
Barcode processing with the Documents Library
Using jQuery Promises in the Documents Library