The LEADTOOLS Forms Recognition and Processing engine provides developers with a comprehensive set of advanced tools to create form recognition and processing applications with minimal coding. It is fast, accurate and reliable.
Overview |
Forms Recognition |
Forms Processing |
Speed Processing |
Low Level Forms Recognition |
Low Level Forms Processing |
Overview. In general, forms recognition and processing systems perform the following steps:
- Creating form attributes
- Comparing a form to a master form
- Aligning the form
- Processing the form
The LEADTOOLS AutoFormsEngine is an optimized and fast implementation of the recognition and processing system. It automatically, creates form attributes, compares it with the Master Forms in the repository, and processes the form fields. LEADTOOLS AutoForms also gives you the option to run in multi-threaded mode to speed up the recognition and processing: taking advantage of multi-core technology. The Leadtools.Forms.Auto namespace provides a rich set of classes, interfaces, and methods that will reduce the implementation time in actions such as:
- Attributes
- Unique features of a Master Form used to identify filled forms in the forms recognition process.
- Master Form
- An unfilled or blank form containing unique attributes to that form. Master Forms can be single or multipage. Master Forms attributes are generated by the different Object Managers.
- Object Manager
- Unique sub-engines which generate attributes for a specific master form.
- Run: Run methods can both recognize and process a form at the same time. With a single line of code you can run and get back recognition and processing result reports. They implement recognition and processing together, faster and more efficiently than conventional methods. Run methods, especially in multi-threaded mode, produce faster results than separately calling recognition and processing.
- Recognize Forms: Form recognition is implemented by running comparisons to Master Forms in a repository. The decision is based on the returned confidence value. When the confidence value is above the minimum confidence value needed, the search is stopped to save time in unnecessary form comparisons.
- Recognize Pages: Search a page to find the type of form it is. Recognition creates the page's attributes and compares it with all Master Form pages in the repository to find the one with the confidence value above the minimum specified.
- Process Forms: Automatically aligns the recognized form and processes its fields. The fields will be updated with the process results.
- Process Pages: Automatically aligns the recognized page and processes its fields. The fields will be updated with the process results.
- Calculate the minimum confidence value: The minimum confidence value is used to determine whether the form or page recognized compares to on of the Master Forms in repository. Setting a value speeds up the search (recognition process) because there is no need to compare a form any more after a confidence above that value is found.
- Generate Master Form attributes: Master Form attributes are generated consistent with the Objects Managers being used.
While most ECM (Enterprise Content Management) systems may take advantage of both recognition and processing, each process, recognition or processing, has a very specific task in a typical workflow.
Forms Recognition. Forms Recognition is the process of automatically identifying the name, type, and ID of any unknown form without human intervention. As long as a master form exists for the form being recognized, the LEADTOOLS Recognition Engine can quickly and accurately distinguish it from an unlimited number of predefined master forms. The engine uses an extremely accurate algorithm to extract the unique features (attributes) of each master form (single or multipage) and stores them in an XML file. This file is portable and efficient so you no longer need to store all of the original images for your master forms, thus freeing up unnecessary disk space. Once you have created master forms for all of the forms you expect to process, you will be able to fully automate the recognition process for all forms no matter which source (archival, scanner, etc) or resolution is used, whether it is deformed, or computer-generated, etc.
Our industry-leading recognition engine allows programmers to fine-tune the engine for the types of forms you expect to process. There are many factors which can be considered when creating each master form's attributes such as text, barcodes, and unique objects in the form. LEADTOOLS has created unique sub-engines ("Object Managers" as referred to by the SDK), to handle all of these different factors. These Object Managers allow you to choose the factors which should be considered when creating master form attributes. You can use a single Object Manager, or a group of them. Each manager has a unique purpose, hence choosing the appropriate manager will increase the performance and accuracy of the forms recognition. For example, if all forms you expect to recognize have unique barcodes, you would most likely need to just use the Barcode Manager. You could use other managers as well, but the Barcode Manager would be all that is necessary so the processing time spent adding other engines would be unnecessary. In addition to automatically creating form attributes through the different Object Managers, the engine has an optional feature which allows you to highlight important information in the form, such as the company or form name. No matter which object manager is used, the engine provides you with comprehensive results of the recognition, including a confidence levl for each form. The Forms Recognition Engine provides the following "Object Managers":
OCR Manager (requires a LEADTOOLS OCR Engine) - The OCR Manager uses OCR to extract the text features from a form to create the form's attributes. The OCR manager can be used with any OCR Engine LEADTOOLS provides such as the Plus and Professional Engines. The OCR Manager is the optimal manager and is capable of recognizing forms which were scanned under several different conditions from the master form (resolution, alignment, etc). It uses an internal algorithm capable of calculating the amount of scale and shift in the unidentified form to provide a complete automatic alignment solution.
- Ocr Manager
- An Object Manager which created attributes based on text fields in the master form.
- Barcode Manager
- An Object Manager which created attributes based on barcode fields in the master form.
Default Manager (No add-on required) - The Default Manager extracts special object features such as lines and inverted text from a form to create the form's attributes. This manager is useful for simple forms which have unique lines and other objects. While accurate, the Barcode and OCR Managers should be used for optimal performance and accuracy. The Default Manager uses image resolution to calculate the alignment. Consequently it is ideal for recognizing forms which are generated at different resolutions, but similar scales and shifts.
- Default Manager
- An Object Manager which created attributes based on unique objects such as lines and invert text in the master form.
Forms Recognition basically works by creating a FormRecognitionAttributes object for each Master Form and form you would like to recognize, and then compares attributes to see which Master Form matches each form with the highest confidence. The following is an outline of the general steps involved in performing Form Recognition on one or more pages.
-
Create the Master Forms Repository that points to the storage location
of the Master Forms.
Code
RasterCodecs.Startup(); string root = @"C:\Forms\FormsDemo\OCR_Test"; RasterCodecs codecs = new RasterCodecs(); DiskMasterFormsRepository repository = new DiskMasterFormsRepository(codecs, root);
-
Create the OCR and Barcode engines to be used in the Auto-Forms Engine.
Code
List<IOcrEngine> ocrEngines = new List<OcrEngine>(); IOcrEngine ocrEngine; //to use four threads int numberOfThreads = 4; for(int i = 0; i < numberOfThreads; i++) { ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Professional, true); ocrEngine.Startup(null, null, null, null); ocrEngines.Add(ocrEngine); } BarcodeEngine.Startup(BarcodeMajorTypeFlags.Barcodes1d | BarcodeMajorTypeFlags.Barcodes2dRead | BarcodeMajorTypeFlags.BarcodesDatamatrixRead | BarcodeMajorTypeFlags.BarcodesPdfRead | BarcodeMajorTypeFlags.BarcodesQrRead); BarcodeEngine barcodeEngine = new BarcodeEngine();
-
Create the Auto-Forms Engine using the AutoFormsEngine Class.
Code
AutoFormsEngine autoEngine = new AutoFormsEngine(repository,ocrEngines,barcodeEngine,30,80, true);
-
Call AutoEngine.Run to recognize and process the form at once, or call
AutoEngine.RecognizeForm to recognize only the form.
Code
AutoFormsRunResult result = autoEngine.Run(image, null, null, null);
For a detailed outline to only recognize a form, see Steps To Recognize and Process a Form |
For a detailed outline to generate a Master Form, see Steps To Generate Master Form and save it to master’s repository |
The Leadtools.Forms.Auto namespace provides a set of classes and interfaces for automated forms recognition and processing with multithread processing. Those who want to implement their own multi-thread process can disable multi-threading in Auto Forms or use the Low Level Forms design. The framework handles Form Categories using Repositories. LEADTOOLS provides sample implementations for both disk-based and database-based form repositories. Users can inherit from the framework's interfaces ( IMasterForm , IMasterFormsCategory , IMasterFormsRepository) and implement their own custom repository as well.
Forms Processing. Forms Processing is the process of extracting the filled-in data information from predefined fields in a form. Fields are defined per page, so fields for a several page form can easily be created and data extracted from the desired field/page. Each field has the following attributes associated with it:
- Name or ID (usually the field name on the form)
- Location on the actual form. The location can be specified in several types of units to accommodate different applications.
- Type of field (text, checkbox, image)
Field information can be processed regardless of image resolution, scale, and other form generation characteristics. No matter which field type is being used, the engine provides you with comprehensive results of the processing, including a confidence value for each result. The Forms Processing Engine provides the following field types:
- Text Field - Text Fields are used to read text characters from the form. The characters can be letters, numbers, punctuation, or symbols. Both handwritten and machine printed characters are supported. The exact level of support of text fields and languages depends on the OCR Module you have added. For example, handwritten fields would require the ICR Module while machine-printed fields would require an OCR Module.
- Barcode Field - Barcode Fields are used to read barcode information from the form. The exact level of support for barcode fields depends on the Barcode Module you have added. For example, 1D barcode fields require the 1D Barcode Module while 2D (DataMatrix, PDF417, QR) barcode fields require the 2D Barcode Module.
- Image Field - Image Fields are used to extract specific images from the form. These images can be logos, stamps, finger prints, etc.
- OMR Field - OMR Fields are used to read check mark information (check box, radio button, etc) from the form. The results indicate whether an area was checked or selected. OMR Fields require an OMR Module.
In addition to the above predefined Field Types, the Processing Engine allows you to create your own custom fields for any unique needs you may have.
The following is an outline of the general steps involved in performing Forms Processing on one or more pages.
-
Create the Master Forms Repository that points to the storage location
of the Master Forms.
Code
RasterCodecs.Startup(); string root = @"C:\Forms\FormsDemo\OCR_Test"; RasterCodecs codecs = new RasterCodecs(); DiskMasterFormsRepository repository = new DiskMasterFormsRepository(codecs, root);
-
Create the OCR and Barcode engines to be used in Auto-Forms Engine.
Code
List<IOcrEngine> ocrEngines = new List<OcrEngine>(); IOcrEngine ocrEngine; //to use four threads int numberOfThreads = 4; for(int i = 0; i < numberOfThreads; i++) { ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Professional, true); ocrEngine.Startup(null, null, null, null); ocrEngines.Add(ocrEngine); } BarcodeEngine.Startup(BarcodeMajorTypeFlags.Barcodes1d | BarcodeMajorTypeFlags.Barcodes2dRead | BarcodeMajorTypeFlags.BarcodesDatamatrixRead | BarcodeMajorTypeFlags.BarcodesPdfRead | BarcodeMajorTypeFlags.BarcodesQrRead); BarcodeEngine barcodeEngine = new BarcodeEngine();
-
Create the AutoEngine using the AutoFormsEngine Class.
Code
AutoFormsEngine autoEngine = new AutoFormsEngine(repository,ocrEngines,barcodeEngine,30,80, true);
-
Call AutoEngine.Run to recognize and process the form at once or call
AutoEngine.ProcessForm to process only the form.
Code
AutoFormsRunResult result = autoEngine.Run(image, null, null, null);
For a detailed outline to recognize only a form, see Steps To Recognize and Process a Form |
For a detailed outline to generate a Master Form, see Steps To Generate Master Form and save it to master’s repository |
- Use the multithread case of AutoFormsEngine.
- If you are performing both recognition and processing, then initialize the AutoFormsEngine only with the OCR engines, and use the OCR Professional engine.
- If you are performing recognition without processing and all your Master Forms have different barcodes, then use only the Barcode engine to generate the Masters attributes and to initialize the AutoForms Engine.
Low Level Forms Recognition. Low Level Forms Recognition makes it possible to design custom algorithms for recognition and forms comparisons. The following is an outline of the general steps involved in performing Form Recognition on one or more pages.
- Create and initialize the FormRecognitionEngine using the Forms Recognition Engine Class.
- Create and add the desired Object Managers using the RecognitionObjectsManager Class.
- Create the Master Form (or several) attributes using the CreateMasterForm Method.
- Add pages to the Master Form using the AddMasterFormPage Method.
- Close the Master Form using the CloseMasterForm Method.
- Create form attributes for the forms you would like to recognize using the CreateForm Method.
- Add Pages to the form to be recognized using the AddFormPage Method.
- Close the form to be recognized using the CloseForm Method.
- Compare the attributes for the form to be recognized to the attributes of each master form using the CompareForm Method.
Master Form attributes can be loaded and saved to disk using the GetData and SetData Methods. In most cases, save all master form attributes to disk and when recognizing filled forms, load each master form attributes file and compare it with the attributes of the form being recognizing to see which returns the highest confidence value. For a simple tutorial using Forms Recognition, see Recognizing Forms.
Low Level Forms Processing. Low Level Forms Processing makes it possible to customize alignment and processing. The following is an outline of the general steps involved in performing Form Processing on one or more pages.
- Create and initialize the Forms Processing Engine using the FormProcessingEngine Class.
- Add the desired fields for each Master Form using the TextFormField, OMRFormField, BarcodeFormField, ImageFormField, or a custom user-defined field.
- Create a form page for each field collection using the FormPage.AddRange Method.
- Add each field page to the processing engine using the FormProcessingEngine.Pages.Add Method.
- Process the fields using the Process Method. This method requires the alignment for the given image. If Forms Recognition has not been performed and you which form is being recognized, use the GetFormAlignment or GetPageAlignment Method. If recognition has been performed call the Alignment property.
Fields can be loaded and saved to disk using the LoadFields and SaveFields Method. In most cases, save all of the fields for each master form to disk. Then, when processing filled forms, load the appropriate form fields from file for use in the FormProcessingEngine. For a simple tutorial using Forms Processing, please see Processing Forms.
SDK Definitions
- Attributes
- Unique features of a Master Form used to identify filled forms in the forms recognition process.
- Barcode Manager
- An Object Manager which created attributes based on barcode fields in the master form.
- Confidence
- Value from 0 to 100 representing how confident the results are. A value of "100" means full confidence while a value of "0" means no confidence.
- Default Manager
- An Object Manager which created attributes based on unique objects such as lines and inverted text in the master form.
- Exclude region
- An area which has no features or attributes necessary for form recognition.
- Field
- A predefined area on a recognized form from which you need to extract text, barcode, checkbox, image, or custom data.
- Filled Data
- Any data a user created on a form in a predefined field. Using the LEADTOOLS Forms Processing Engine, this data can be extracted from a recognized form.
- Filled Form
- A Master Form containing filled data. The Forms Recognition and processing Engine is used to uniquely identify the forms and extract the data from it’s fields.
- Form
- A filled form which needs to be recognized and/or processed.
- Form Alignment
- Information necessary in aligning a complete recognized form with the corresponding master form.
- Form Category
- A collection or logical grouping of similar Master Forms in a Form Repository. A Form Category can contain Master Forms and/or sub categories.
- Forms Processing
- The process of extracting user filled data from predefined fields in a recognized form.
- Forms Recognition
- The process of identifying a filled form with that of a Master Form.
- Forms Repository
- A storage system for Form Categories. This is the top-level of the collection.
- Include region
- An area which has features, or its features or attributes necessary for form recognition.
- Master Form / Template
- An unfilled or blank form containing unique attributes to that form. Master Forms can be single or multipage. Master Forms attributes are generated by the different Object Managers.
- Object Manager
- Unique sub-engines which generate attributes for a specific master form.
- Ocr Manager
- An Object Manager which created attributes based on text fields in the master form.
- Page Alignment
- Information necessary in aligning a recognized form page with the corresponding page from the master form.
- Region of interest
- An area which has very important attributes necessary for form recognition. These regions are used to highlight important features such as the company or form name.