←Select platform

AutoGetText Property (DocumentViewerText)

Summary

Indicates whether this DocumentViewerText should parse the text of the pages automatically when needed.

Syntax

C++

public bool AutoGetText { get; set; }

Public Property AutoGetText() As Boolean 
   Get 
   Set

public:  
   property bool AutoGetText 
   { 
      bool get() 
      void set(bool value) 
   }

Property Value

true if this DocumentViewerText should parse the text of the pages automatically when needed, otherwise; false. Default value is false.

Remarks

DocumentViewerText contains many operations that require the text of the page(s) being parsed. This is done by calling the DocumentPage.GetText method of the page.

GetText parses the text from the page using SVG or OCR and that could be a slow operation. Hence, when it is called by DocumentViewerText, the result DocumentPageText object is stored internally and re-used when it is needed.

When the user sets a new Document in DocumentViewer using SetDocument, the saved DocumentPageText objects are discared but the viewer will not start parsing the text for any pages till it is needed.

The AutoGetText controls what happens when an operation requires the text for a page that has not been obtained yet.

For example, the user calls SetDocument with a brand new document and then calls SelectAll to select all the text in tha pages. This requires looping through all the pages and parsing the text from the corresponding DocumentPageText objects.

At this point, the viewer does not have any DocumentPageText objects saved and needs to call GetText is true, then the viewer will call GetDocumentPageText from inside the loop to obtain the DocumentPageText object for each page and store it internally. When all the pages have been parsed, the selected text can be updated and the result highlighted in the view.

The next time SelectAll is called, the viewer will perform the same action but this time uses the DocumentPageText obtained and saved from the previous operation and will call GetDocumentPageText.

When an operation such as Find is called, it perform similar action to SelectAll. However, this method only requires the text of the current page it is working on, and hence, checks if a DocumentPageText for only this page exists, if not, calls GetDocumentPageText to get the text for the next page only when needed.

Most of the operations of DocumentViewerText work in similar way, first, the engine tries to use the DocumentPageText objects obtained from a previous operation, if it does not exist, it calls GetDocumentPageText to parse the text from the original document and save it internally for next usage.

GetDocumentPageText obtains the text using DocumentPage.GetText. This can be a slow operation, especially if OCR is used. Therefore, the value of AutoGetText is false by default and text is not parsed automatically by the viewer unless instructed to.

AutoGetText can be used depending on the application and along with HasAnyDocumentPageText and HasDocumentPageText and the Operation event can be used to perform in any desired scenario.

Scenarios

Pre-parse the text

If the text of the whole document is required all the time and the application cannot function without this, then call GetAllDocumentPageText after the document is set. This will loop through all the pages and calls GetDocumentPageText for each. The method does not return till all the DocumentPageText objects are obtained and stored. After that, using any of the DocumentViewerText methods will be instant and without delay and the original document is not used for this purpose.

An alternative option is to use GetDocumentPageText in a loop (discarding the result) to force DocumentViewerText to obtain and store the DocumentPageText objects. This allows the application to call this method from a dedicated thread for example and allow the user to abort the operation between loop iterations.

Automatically get the text as needed

In this mode, set AutoGetText to true. Now when calling DocumentViewerText methods that require DocumentPageText objects, GetDocumentPageText will be called automatically to obtain the parse the pages as needed.

The draw-back of this method is that an operation like SelectAll might take a considerable amount of time the first time it is called especially when the document has a large amount of pages or OCR is used to parse the text. Therefore, the Operation event should be used to show a busy dialog and allow the user to abort the operation.

GetDocumentPageText works by checking if the DocumentPageText object for the page has been previously obtained, if so, it will return this object immediately. Otherwise, it will call DocumentPage.GetText. But before this method is invoked, Operation event is fired with DocumentViewerOperation.GetText. The application can then show a busy dialog when this event occurs to indicate to the user that the operation will take some time to finish.

Alternatively, the application can use HasAnyDocumentPageText and HasDocumentPageText as needed prior to calling the method to check whether the operation might take time to finish (is slow). If it is determined that the operation is slow, then the application can show the busy dialog before calling the method and then invoke the operation asynchronously.

Manually get the text when needed

In this mode, the application uses HasAnyDocumentPageText and HasDocumentPageText to determine if required text objects are obtained and operation will be instant, if so, it will continue. Otherwise will warn the user that the operation might be slow and prompt to continue or cancel. If the user selected to continue, then GetDocumentPageText is called to obtain and parse the text manually before calling the actual operation.

The LEADTOOLS Document Viewer Demo can get the text using both the Automatic and Manual methods. The demo contains a menu item that can flip between the two modes and changes the way the application calls the text operations. Refer to the demo source code for a full example.

Requirements

Target Platforms

See Also