Speeding Up Forms Recognition Using the Full Text Search Feature

formsdoc-chart.gif

The LEADTOOLS Forms Recognition toolkit uses object managers to recognize a master form or page. Object managers include the OCR, Barcode and default managers. The OCR Object Manager is the optimal engine to use as it is capable of recognizing forms which were scanned under several different conditions other than the master form.

Using the OCR object manager can introduce a performance hit in your application and when dealing with a large number of master forms, this hit can add up to a significant amount. The full text search feature of the LEADTOOLS Forms Recognition toolkit can be used to improve the performance of the OCR object manager.

Designed to work as a complement to the OCR object manager, the Full Text Search feature can work with existing master form sets as well as any new master forms you add to your repository.

The Full Text Search is implemented by the IFullTextSearchManager interface.

To use the Full Text Search feature, first create an instance of a class that implements this interface. Next, initialize its properties, and then pass it to either the low level FormRecognitionEngine (through the FullTextSearchManager property) or the high level AutoFormsEngine (through the SetFullTextSearchManager method). From there on, the engine will use the engine's methods to perform full text searching during the recognition process.

LEADTOOLS currently supports two implementations: a Disk Full Text Search Manager, and a SQL Server Full Text Search Manager.

Disk Full Text Search Manager

The Disk Full Text Search Manager implementation is in the Leadtools.Forms.Recognition.Search assembly. It is implemented by the DiskFullTextSearchManager class.

The Disk Full Text Search Manager uses file-based locks to ensure multiple threads and processes can access the index at the same time without data loss.

The DiskFullTextSearchManager class uses a folder on disk (either on the same machine or on a network-mapped file) to store the metadata data of the forms participating in the search.

Note:

The folder must be set prior to calling any other methods.

Prerequisites

Note:

There are no external dependencies or prerequisites to use this class.

Access the folder with the DiskFullTextSearchManager.IndexDirectory property. If the directory does not exist or does not contain any prior indexed items, the manager will create an empty index ready to be populated. Items added to the manager will have their metadata extracted and saved into disk files in the index directory.

SQL Server Full Text Search Manager

The SQL Server Full Text Search Manager implementation is in the Leadtools.Forms.Recognition assembly. It is implemented by the SqlServerFullTextSearchManager class.

The SQL Server Full Text Search Manager uses SQL Server's own locking to ensure multiple threads and processes can access the index at the same time without data loss.

The SqlServerFullTextSearchManager class uses Microsoft SQL Server's full text search feature to store the metadata data of the forms participating in the search.

Note: The ConnectionString must be set prior to calling any other methods.

Access it through the SqlServerFullTextSearchManager.ConnectionString property.

Prerequisites

The SQL Server Full Text Search Manager has the following prerequisites:

Using the Full Text Search Feature

Use the following code to implement full text searching in your code.

Creating the IFullTextSearchManager

The following code snippet creates a Disk-based full text search manager:

C#
IFullTextSearchManager CreateFullTextSearchManager() 
{ 
   // The index directory. Replace with the one you want to use.  
   // This can either be a folder on the current machine or a network-mapped folder 
   string indexDirectory = "//network-path/myforms-index-folder"; 
   // Create the disk full text search manager 
   DiskFullTextSearchManager fullTextSearchManager = new DiskFullTextSearchManager(); 
   // Set the index directory 
   fullTextSearchManager.IndexDirectory = indexDirectory; 
   return fullTextSearchManager; 
} 

And the following code snippet creates a SQL Server-based full text search manager:

C#
IFullTextSearchManager CreateFullTextSearchManager() 
{ 
   // The database connection string. Replace with your string 
   string connectionString = "Data Source=MY_DATABASE_SERVER;Initial Catalog=FormsDb;Integrated Security=True;Connect Timeout=15;Encrypt=False;TrustServerCertificate=False"; 
 
   // Create the full text search manager to use. Replace the connection string with yours 
   SqlServerFullTextSearchManager fullTextSearchManager = new SqlServerFullTextSearchManager(); 
   // Set the connection string 
   fullTextSearchManager.ConnectionString = connectionString; 
   return fullTextSearchManager; 
} 

Using the AutoFormsEngine
C#
// The repository name, replace with yours 
string repositoryName = "My Repository"; 
 
// Create the full text search manager to use. Use any of the methods above 
IFullTextSearchManager fullTextSearchManager = CreateFullTextSearchManager(); 
 
// Load the repository, in this example, disk-based 
DiskMasterFormsRepository repository = new DiskMasterFormsRepository(codecs, root); 
 
// Create AutoFormsEngine instance 
AutoFormsEngine autoFormsEngine = new AutoFormsEngine(new AutoFormsEngineCreateOptions 
{ 
   Repository = repository, 
   RecognitionOcrEngine = recognitionOcrEngine, 
   ProcessingOcrEngine = processingOcrEngine, 
   MinimumConfidenceKnownForm = 30, 
   MinimumConfidenceRecognized = 80, 
   RecognizeFirstPageOnly = true 
}); 
 
// Set which full text search manager to use 
autoFormsEngine.SetFullTextSearchManager(fullTextSearchManager, repositoryName, null); 

Using the Low Level Forms Engine
C#
// The repository name, replace with yours 
string repositoryName = "My Repository"; 
 
// Create the full text search manager to use. Use any of the methods above 
IFullTextSearchManager fullTextSearchManager = CreateFullTextSearchManager(); 
 
// Create the forms recognition engine  
FormRecognitionEngine formsRecognitionEngine = MyCreateFormsRecognitionEngine(); 
// Set the full text search manager to use 
formsRecognitionEngine.FullTextSearchManager = fullTextSearchManager; 

Adding Master Forms

At this point, the database (or index) has been created but it does not contain information on any of your master forms. So you need to add the master forms from your repository to the database so you can use them. This step is also required when you create new master forms. It is not performed automatically by the Recognition engine.

Adding Existing Master Forms

Use the following code to add all of the master forms in an existing repository:

Using the AutoFormsEngine

C#
// Use the code from Using the Full Text Search Feature and add the following: 
 
// Add the master forms in the repository into the full-text search. 
// This method will add (if they do not exist) or update the master forms properties in the database 
autoFormsEngine.UpsertMasterFormsToFullTextSearch(); 

Using the Low Level Forms Engine
C#
// Use the code from Using the Full Text Search Feature and add the following: 
 
// Add the master forms in the repository into the full-text search. 
       
// Loop through your existing master forms 
byte[] masterFormData; 
do 
{ 
   // Get the next master form 
   masterFormData = MyGetNextMasterFormData(); 
   if (masterFormData == null) 
      break; 
 
   // Create a master forms attributes from this data 
   var masterFormAttributes = new FormRecognitionAttributes(); 
   masterFormAttributes.SetData(masterFormData); 
   // This method will add (if they do not exist) or update the master form properties in the database 
   formsRecognitionEngine.UpsertMasterFormToFullTextSearch(masterFormAttributes, repositoryName, null, null, null, null); 
} 
while (masterFormData != null); 

Adding New Master Forms

Use the following code to add all new master forms to an existing repository:

C#
// Use the code from Using the Full Text Search Feature and add the following: 
 
// Generate the master form as usual 
FormRecognitionAttributes masterFormsAttributes = MyGenerateMasterFormAttributes(autoFormsEngine); 
// Save it as usual 
MySaveMasterForm(autoFormsEngine, masterFormsAttributes); 
// Now add the master form properties to the database 
formsRecognitionEngine.UpsertMasterFormToFullTextSearch(masterFormsAttributes, repositoryName, null, null, null, null); 

Using Full-Text Search During Recognition

The full text manager is set up inside the forms engine and is ready to be used during recognition. Both the AutoFormsEngine and the FormRecognitionEngine contain the following properties that control the use of full text search:

The workflow is as follows:

  1. Whenever recognition is performed on a form or a page, the engine will search the database for candidates.
  2. Then it will sort the candidates using their rank from highest to lowest, dropping out any item that has a ranking lower than the minimum.
  3. Finally, it will return the top candidates based on the maximum value.

Using the AutoFormsEngine

The high-level AutoForms engine automatically uses the full-text search setup when the OCR Object Manager is used whenever Run, RecognizePage or RecognizeForm is called. The internal workflow is as follows:

  1. The engine gets a list of master form candidates from the database, sorted by rank.
  2. It drops any candidate ranking lower than the minimum specified.
  3. It trims the list to use only the top maximum number of candidates specified.
  4. In most cases, the result is a single master form candidate with a very high ranking. If there is only one result, the normal OCR recognition process continues to the next step, aligning the form.
  5. If more than one master form candidate has a high ranking, the OCR recognition process is performed on each of them. The candidate with the highest rank is used.

Using the Low Level Forms Engine

The FormRecognitionEngine class contains the following new methods for recognizing a page or form using the full text search feature:

These methods return a list of the candidate pages or forms from the database. The specific candidates returned depend on the contents of the form's attributes as well as the full text search options selected. Each candidate, or FullTextSearchItem, contains the following members:

Member Description
string RepositoryName The name of the repository. This is the same value passed to the get candidates method.
string FormName The name of the form.
int PageNumber The page number in the form (if the page candidate method was called).
int ScoreRank The rank value. Values range from 0 to 255, with 255 being the highest.

You should incorporate these methods in your forms recognition workflow. See the example for FormRecognitionEngine for a demonstration.

Help Version 23.0.2024.8.27
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2024 LEAD Technologies, Inc. All Rights Reserved.

LEADTOOLS Imaging, Medical, and Document
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.