Products | Support | Email a link to this topic. | Send comments on this topic. | Back to Introduction - All Topics | Help Version 19.0.5.7
|
The LEADTOOLS Forms Recognition and Processing engine provides developers with a comprehensive set of advanced tools to create form recognition and processing applications with minimal coding. It is fast, accurate and reliable.
Overview |
Forms Recognition |
Forms Processing |
Speed Processing |
Low Level Forms Recognition |
Low Level Forms Processing |
Overview. In general, forms recognition and processing systems perform the following steps:
The LEADTOOLS AutoFormsEngine is an optimized and fast implementation of the recognition and processing system. It automatically, creates form attributes, compares it with the Master Forms in the repository, and processes the form fields. LEADTOOLS AutoForms also gives you the option to run in multi-threaded mode to speed up the recognition and processing: taking advantage of multi-core technology. The Leadtools.Forms.Auto namespace provides a rich set of classes, interfaces, and methods that will reduce the implementation time in actions such as:
While most ECM (Enterprise Content Management) systems may take advantage of both recognition and processing, each process, recognition or processing, has a very specific task in a typical workflow.
Forms Recognition. Forms Recognition is the process of automatically identifying the name, type, and ID of any unknown form without human intervention. As long as a master form exists for the form being recognized, the LEADTOOLS Recognition Engine can quickly and accurately distinguish it from an unlimited number of predefined master forms. The engine uses an extremely accurate algorithm to extract the unique features (attributes) of each master form (single or multipage) and stores them in an XML file. This file is portable and efficient so you no longer need to store all of the original images for your master forms, thus freeing up unnecessary disk space. Once you have created master forms for all of the forms you expect to process, you will be able to fully automate the recognition process for all forms no matter which source (archival, scanner, etc.) or resolution is used, whether it is deformed, or computer-generated, etc.
Our industry-leading recognition engine allows programmers to fine-tune the engine for the types of forms you expect to process. There are many factors which can be considered when creating each master form's attributes such as text, barcodes, and unique objects in the form. LEADTOOLS has created unique sub-engines ("Object Managers" as referred to by the SDK), to handle all of these different factors. These Object Managers allow you to choose the factors which should be considered when creating master form attributes. You can use a single Object Manager, or a group of them. Each manager has a unique purpose, hence choosing the appropriate manager will increase the performance and accuracy of the forms recognition. For example, if all forms you expect to recognize have unique barcodes, you would most likely need to just use the Barcode Manager. You could use other managers as well, but the Barcode Manager would be all that is necessary so the processing time spent adding other engines would be unnecessary. In addition to automatically creating form attributes through the different Object Managers, the engine has an optional feature which allows you to highlight important information in the form, such as the company or form name. No matter which object manager is used, the engine provides you with comprehensive results of the recognition, including a confidence level for each form. The Forms Recognition Engine provides the following "Object Managers":
OCR Manager (requires a LEADTOOLS OCR Engine) - The OCR Manager uses OCR to extract the text features from a form to create the form's attributes. The OCR manager can be used with any OCR Engine LEADTOOLS provides such as the Professional Engines. The OCR Manager is the optimal manager and is capable of recognizing forms which were scanned under several different conditions from the master form (resolution, alignment, etc.). It uses an internal algorithm capable of calculating the amount of scale and shift in the unidentified form to provide a complete automatic alignment solution.
Barcode Manager(requires a LEADTOOLS Barcode Engine) - The Barcode Manager uses Barcode recognition technology to extract the barcode features from a form to create the form's attributes. This manager is capable of accurately recognizing forms in fractions of a second, even larger size images. The Barcode Manager can be used with any Barcode Engine LEADTOOLS provides, such as the 1D and 2D (DataMatrix, PDF417, QR) add-on modules. The Barcode Manager uses the image resolution to calculate the alignment so it is ideal for recognizing forms which can have different resolutions, but similar scales and shifts. Since most forms already contain some type of unique barcode, the Barcode Manager is a perfect fit for most scenarios.Default Manager (No add-on required) - The Default Manager extracts special object features such as lines and inverted text from a form to create the form's attributes. This manager is useful for simple forms which have unique lines and other objects. While accurate, the Barcode and OCR Managers should be used for optimal performance and accuracy. The Default Manager uses image resolution to calculate the alignment. Consequently it is ideal for recognizing forms which are generated at different resolutions, but similar scales and shifts.
Forms Recognition basically works by creating a FormRecognitionAttributes object for each Master Form and form you would like to recognize, and then compares attributes to see which Master Form matches each form with the highest confidence. The following is an outline of the general steps involved in performing Form Recognition on one or more pages.
SetLicense();
// Set the name of the folder that contains the Master Forms
string root = @"C:\Users\Public\Documents\LEADTOOLS Images\Forms\MasterForm Sets\OCR\";
RasterCodecs codecs = new RasterCodecs()
DiskMasterFormsRepository repository = new DiskMasterFormsRepository(codecs, root);
// Create the OCR engine instance, and use LEADTOOLS Advantage OCR engine
IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)
// Startup the OCR engine
ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 19\Bin\Common\OcrAdvantageRuntime");
// Create the Main BarcodeEngine instance
BarcodeEngine barcodeEngine = new BarcodeEngine();
// Create the Main AutoFormsEngine instance
AutoFormsEngine autoEngine = new AutoFormsEngine(repository, ocrEngine, barcodeEngine, 30, 80, true);
// Setup the recognition options
autoEngine.RecognizeFirstPageOnly = true;
autoEngine.MinimumConfidenceKnownForm = 40;
// Get a list of the files to process
string[] files = Directory.GetFiles(@"C:\Users\Public\Documents\LEADTOOLS Images\Forms\Images\", "*.tif");
// This is the function that contains the main recognition process
ProcessFiles(autoEngine, files);
private static void ProcessFiles(AutoFormsEngine autoEngine, string[] files)
{
Console.WriteLine("Started Processing Files ...");
// Get the number of files to process
int fileCount = files.Length;
// Event to notify us when all work is finished
using (AutoResetEvent finishedEvent = new AutoResetEvent(false))
{
// Loop through all Files in the given Folder
foreach (string file in files)
{
// Capture the file name here, since we are using an anonymous function
string fileToProcess = file;
// Process it in a thread
ThreadPool.QueueUserWorkItem((state) =>
{
try
{
// Show the name
string name = Path.GetFileName(fileToProcess);
Console.WriteLine("Processing {0}", name);
// Process it
AutoFormsRunResult result = autoEngine.Run(fileToProcess, null);
// Check results
if (result.FormFields != null && result.RecognitionResult.MasterForm != null)
Console.WriteLine(string.Format(" Master Form Found \"{0}\" for {1}", result.RecognitionResult.MasterForm.Name, name));
else
Console.WriteLine(string.Format(" No Master Form Found for {0}", name));
}
catch(Exception ex)
{
Console.WriteLine("Error {0}", ex.Message);
}
finally
{
if (Interlocked.Decrement(ref fileCount) == 0)
{
// We are done, inform the main thread
finishedEvent.Set();
}
}
});
}
// Wait till all operations are finished
finishedEvent.WaitOne();
Console.WriteLine("Finished Processing Files");
}
}
For a detailed outline to only recognize a form, see Steps To Recognize and Process a Form |
For a detailed outline to generate a Master Form, see Steps To Generate Master Form and save it to master's repository |
The Leadtools.Forms.Auto namespace provides a set of classes and interfaces for automated forms recognition and processing with multithread processing. Those who want to implement their own multi-thread process can disable multi-threading in Auto Forms or use the Low Level Forms design. The framework handles Form Categories using Repositories. LEADTOOLS provides sample implementations for disk-based form repositories. Users can inherit from the framework's interfaces ( IMasterForm , IMasterFormsCategory , IMasterFormsRepository) and implement their own custom repository as well.
Forms Processing. Forms Processing is the process of extracting the filled-in data information from predefined fields in a form. Fields are defined per page, so fields for a several page form can easily be created and data extracted from the desired field/page. Each field has the following attributes associated with it:
Field information can be processed regardless of image resolution, scale, and other form generation characteristics. No matter which field type is being used, the engine provides you with comprehensive results of the processing, including a confidence value for each result. The Forms Processing Engine provides the following field types:
In addition to the above predefined Field Types, the Processing Engine allows you to create your own custom fields for any unique needs you may have.
Low Level Forms Recognition. Low Level Forms Recognition makes it possible to design custom algorithms for recognition and forms comparisons. The following is an outline of the general steps involved in performing Form Recognition on one or more pages.
Master Form attributes can be loaded and saved to disk using the GetData and SetData Methods. In most cases, save all master form attributes to disk and when recognizing filled forms, load each master form attributes file and compare it with the attributes of the form being recognizing to see which returns the highest confidence value. For a simple tutorial using Forms Recognition, see Recognizing Forms.
Low Level Forms Processing. Low Level Forms Processing makes it possible to customize alignment and processing. The following is an outline of the general steps involved in performing Form Processing on one or more pages.
Fields can be loaded and saved to disk using the LoadFields and SaveFields Method. In most cases, save all of the fields for each master form to disk. Then, when processing filled forms, load the appropriate form fields from file for use in the FormProcessingEngine. For a simple tutorial using Forms Processing, please see Processing Forms.