The LEADTOOLS Forms Recognition toolkit uses object managers to recognize a master form or page. Object managers include the OCR, Barcode and default managers. The OCR Object Manager is the optimal engine to use as it is capable of recognizing forms which were scanned under several different conditions other than the master form.
Using the OCR object manager can introduce a performance hit in your application and when dealing with a large number of master forms, this hit can add up to a significant amount. The full text search feature of the LEADTOOLS Forms Recognition toolkit can be used to improve the performance of the OCR object manager.
Designed to work as a complement to the OCR object manager, the Full Text Search feature can work with existing master form sets as well as any new master forms you add to your repository.
The Full Text Search is implemented by the IFullTextSearchManager interface.
To use the Full Text Search feature, first create an instance of a class that implements this interface. Next, initialize its properties, and then pass it to either the low level FormRecognitionEngine (through the FullTextSearchManager property) or the high level AutoFormsEngine (through the SetFullTextSearchManager method). From there on, the engine will use the engine's methods to perform full text searching during the recognition process.
LEADTOOLS currently supports two implementations: a Disk Full Text Search Manager, and a SQL Server Full Text Search Manager.
The Disk Full Text Search Manager implementation is in the Leadtools.Forms.Recognition.Search
assembly. It is implemented by the DiskFullTextSearchManager class.
The Disk Full Text Search Manager uses file-based locks to ensure multiple threads and processes can access the index at the same time without data loss.
The DiskFullTextSearchManager class uses a folder on disk (either on the same machine or on a network-mapped file) to store the metadata data of the forms participating in the search.
Note: The folder must be set prior to calling any other methods.
Access the folder with the DiskFullTextSearchManager.IndexDirectory property. If the directory does not exist or does not contain any prior indexed items, the manager will create an empty index ready to be populated. Items added to the manager will have their metadata extracted and saved into disk files in the index directory.
The SQL Server Full Text Search Manager implementation is in the Leadtools.Forms.Recognition
assembly. It is implemented by the SqlServerFullTextSearchManager class.
The SQL Server Full Text Search Manager uses SQL Server's own locking to ensure multiple threads and processes can access the index at the same time without data loss.
The SqlServerFullTextSearchManager class uses Microsoft SQL Server's full text search feature to store the metadata data of the forms participating in the search.
Note: The ConnectionString must be set prior to calling any other methods.
Access it through the SqlServerFullTextSearchManager.ConnectionString property.
The SQL Server Full Text Search Manager has the following prerequisites:
SELECT SERVERPROPERTY('IsFullTextInstalled')
Use the following code to implement full text searching in your code.
The following code snippet creates a Disk-based full text search manager:
IFullTextSearchManager CreateFullTextSearchManager()
{
// The index directory. Replace with the one you want to use.
// This can either be a folder on the current machine or a network-mapped folder
string indexDirectory = "//network-path/myforms-index-folder";
// Create the disk full text search manager
DiskFullTextSearchManager fullTextSearchManager = new DiskFullTextSearchManager();
// Set the index directory
fullTextSearchManager.IndexDirectory = indexDirectory;
return fullTextSearchManager;
}
And the following code snippet creates a SQL Server-based full text search manager:
IFullTextSearchManager CreateFullTextSearchManager()
{
// The database connection string. Replace with your string
string connectionString = "Data Source=MY_DATABASE_SERVER;Initial Catalog=FormsDb;Integrated Security=True;Connect Timeout=15;Encrypt=False;TrustServerCertificate=False";
// Create the full text search manager to use. Replace the connection string with yours
SqlServerFullTextSearchManager fullTextSearchManager = new SqlServerFullTextSearchManager();
// Set the connection string
fullTextSearchManager.ConnectionString = connectionString;
return fullTextSearchManager;
}
// The repository name, replace with yours
string repositoryName = "My Repository";
// Create the full text search manager to use. Use any of the methods above
IFullTextSearchManager fullTextSearchManager = CreateFullTextSearchManager();
// Load the repository, in this example, disk-based
DiskMasterFormsRepository repository = new DiskMasterFormsRepository(codecs, root);
// Create AutoFormsEngine instance
AutoFormsEngine autoFormsEngine = new AutoFormsEngine(new AutoFormsEngineCreateOptions
{
Repository = repository,
RecognitionOcrEngine = recognitionOcrEngine,
ProcessingOcrEngine = processingOcrEngine,
MinimumConfidenceKnownForm = 30,
MinimumConfidenceRecognized = 80,
RecognizeFirstPageOnly = true
});
// Set which full text search manager to use
autoFormsEngine.SetFullTextSearchManager(fullTextSearchManager, repositoryName, null);
// The repository name, replace with yours
string repositoryName = "My Repository";
// Create the full text search manager to use. Use any of the methods above
IFullTextSearchManager fullTextSearchManager = CreateFullTextSearchManager();
// Create the forms recognition engine
FormRecognitionEngine formsRecognitionEngine = MyCreateFormsRecognitionEngine();
// Set the full text search manager to use
formsRecognitionEngine.FullTextSearchManager = fullTextSearchManager;
At this point, the database (or index) has been created but it does not contain information on any of your master forms. So you need to add the master forms from your repository to the database so you can use them. This step is also required when you create new master forms. It is not performed automatically by the Recognition engine.
Use the following code to add all of the master forms in an existing repository:
// Use the code from Using the Full Text Search Feature and add the following:
// Add the master forms in the repository into the full-text search.
// This method will add (if they do not exist) or update the master forms properties in the database
autoFormsEngine.UpsertMasterFormsToFullTextSearch();
// Use the code from Using the Full Text Search Feature and add the following:
// Add the master forms in the repository into the full-text search.
// Loop through your existing master forms
byte[] masterFormData;
do
{
// Get the next master form
masterFormData = MyGetNextMasterFormData();
if (masterFormData == null)
break;
// Create a master forms attributes from this data
var masterFormAttributes = new FormRecognitionAttributes();
masterFormAttributes.SetData(masterFormData);
// This method will add (if they do not exist) or update the master form properties in the database
formsRecognitionEngine.UpsertMasterFormToFullTextSearch(masterFormAttributes, repositoryName, null, null, null, null);
}
while (masterFormData != null);
Use the following code to add all new master forms to an existing repository:
// Use the code from Using the Full Text Search Feature and add the following:
// Generate the master form as usual
FormRecognitionAttributes masterFormsAttributes = MyGenerateMasterFormAttributes(autoFormsEngine);
// Save it as usual
MySaveMasterForm(autoFormsEngine, masterFormsAttributes);
// Now add the master form properties to the database
formsRecognitionEngine.UpsertMasterFormToFullTextSearch(masterFormsAttributes, repositoryName, null, null, null, null);
The full text manager is set up inside the forms engine and is ready to be used during recognition. Both the AutoFormsEngine and the FormRecognitionEngine contain the following properties that control the use of full text search:
int FullTextSearchMinimumRank {get; set;}
- The minimum rank value to be considered a match (or candidate). The default value is -1, meaning to return all candidates.int FullTextSearchMaximumCandidates{get; set;}
- The maximum number of matches (candidates) to return. The default value is 3.The workflow is as follows:
The high-level AutoForms engine automatically uses the full-text search setup when the OCR Object Manager is used whenever Run, RecognizePage or RecognizeForm is called. The internal workflow is as follows:
The FormRecognitionEngine class contains the following new methods for recognizing a page or form using the full text search feature:
These methods return a list of the candidate pages or forms from the database. The specific candidates returned depend on the contents of the form's attributes as well as the full text search options selected. Each candidate, or FullTextSearchItem, contains the following members:
Member | Description |
---|---|
string RepositoryName |
The name of the repository. This is the same value passed to the get candidates method. |
string FormName |
The name of the form. |
int PageNumber |
The page number in the form (if the page candidate method was called). |
int ScoreRank |
The rank value. Values range from 0 to 255, with 255 being the highest. |
You should incorporate these methods in your forms recognition workflow. See the example for FormRecognitionEngine for a demonstration.