This tutorial shows how to create, recognize, and process forms with structured and unstructured fields with Master Forms using the LEADTOOLS Document Analyzer in a C# .NET 6 application.
Overview | |
---|---|
Summary | This tutorial covers how to recognize and process both fields and rulesets of a form using the Document Analyzer's AutoFormsEngine in a C# .NET 6 Console application. |
Completion Time | 20 minutes |
Visual Studio Project | Download tutorial project (1 MB) |
Platform | C# .NET 6 Console Application |
IDE | Visual Studio 2022 |
Runtime Target | .NET 6 or higher |
Development License | Download LEADTOOLS |
Get familiar with the basic steps of creating a project by reviewing the Add References and Set a License tutorial, before working on the Combine Forms Recognition with the Document Analyzer - C# .NET 6 tutorial.
Forms come in a variety of shapes and sizes, each with varying amounts of fields filled with information. The types of information stored in these fields can be quite similar or differ drastically depending on the type of form that is used. The LEADTOOLS SDK provides a variety of classes and interfaces for automated processing of forms, allowing for quick and efficient detection of both structured and unstructured fields.
Structured Fields
The fields in the form have static locations, similar data types, and can be defined in a ruleset. Every structured form of the same type should contain the same fields for information, with matching types and structure. Some examples are tax forms for US citizens, such as 1040EZ or W-2 forms, where fields are expected to appear at predefined locations and contain similar data type.
Unstructured Fields
The fields in the form do not have predefined characteristics, such as type or location. A ruleset defined for another structured or unstructured form will likely not be able to perfectly detect all of the fields on an unstructured form, so one must be created for that specific form to define its fields. An example of this could be a newly created form that is specialized for an activity or some unique reporting process.
Start with a copy of the project created in the Add References and Set a License tutorial. If the project is not available, follow the steps in that tutorial to create it.
The references needed depend upon the purpose of the project. References can be added via NuGet packages.
This tutorial requires the following NuGet package:
Leadtools.Document.Sdk
Newtonsoft.Json
Alternatively, if NuGet packages are not used, the following DLLs are required:
Leadtools.dll
Leadtools.Barcode.dll
Leadtools.Codecs.dll
Leadtools.Core.dll
Leadtools.Document.dll
Leadtools.Document.Analytics.dll
Leadtools.Document.Pdf.dll
Leadtools.Document.Raster.dll
Leadtools.Document.Unstructured.dll
Leadtools.Document.Writer.dll
Leadtools.Forms.Auto.dll
Leadtools.Forms.Common.dll
Leadtools.Forms.Processing.dll
Leadtools.Forms.Recognition.dll
Leadtools.Ocr.dll
Newtonsoft.Json
For a complete list of which DLL files are required for your application, refer to Files to be Included With Your Application.
The License unlocks the features needed for the project. It must be set before any toolkit function is called. For details, including tutorials for different platforms, refer to Setting a Runtime License.
There are two types of runtime licenses:
With the project created, the references added, and the license set, coding can begin.
In the Solution Explorer, open Program.cs
. Add the following statements to the using
block at the top of Program.cs
.
using Leadtools;
using Leadtools.Codecs;
using Leadtools.Document;
using Leadtools.Document.Analytics;
using Leadtools.Document.Data;
using Leadtools.Document.Unstructured;
using Leadtools.Forms.Auto;
using Leadtools.Forms.Processing;
using Leadtools.Forms.Recognition;
using Leadtools.Ocr;
using Newtonsoft.Json;
Add the below global variables to the Program
class.
private static AutoFormsEngine autoEngine;
private static RasterCodecs codecs;
private static IOcrEngine ocrEngine;
private static DiskMasterFormsRepository masterFormsRepository;
private static string masterFormSetDirectory;
private static string filledFormDirectory;
Set the values of the masterFormSetDirectory
, and filledFormDirectory
to point to your desired directories containing the Master Form sets and Filled Forms that are to be recognized, as seen below. For the purpose of this tutorial, we have available for download a set of Master Forms and Filled Forms.
Then, add a new method to the Program
class named InitFormsEngines()
and call it inside Main()
below the set license call.
static void Main(string[] args)
{
try
{
string projectRoot = Directory.GetParent(Environment.CurrentDirectory).Parent.Parent.FullName;
masterFormSetDirectory = Path.Combine(projectRoot, "MasterForm Sets");
filledFormDirectory = Path.Combine(projectRoot, "FilledForms");
// Startup
InitLEAD();
InitFormsEngines();
// Cleanup
autoEngine.Dispose();
if (ocrEngine != null && ocrEngine.IsStarted)
ocrEngine.Shutdown();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
Note The above code assumes that the Master Form and Filled Form resources provided above are located at the base of the project directory. Ensure that these directories are changed to the correct paths containing the Master Forms and Filled Forms to be used.
Add the code below to the InitFormsEngines()
method to initialize the FormRecognitionEngine
, FormProcessingEngine
, RasterCodecs
, and IOcrEngine
objects.
private static void InitFormsEngines()
{
codecs = new RasterCodecs();
ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
ocrEngine.Startup(codecs, null, null, @"C:\LEADTOOLS23\Bin\Common\OcrLEADRuntime");
masterFormsRepository = new DiskMasterFormsRepository(codecs, masterFormSetDirectory);
autoEngine = new AutoFormsEngine(masterFormsRepository, ocrEngine, null, AutoFormsRecognitionManager.Default | AutoFormsRecognitionManager.Ocr, 30, 80, true);
}
In the Program
class, add two new methods called RecognizeForm(string formToRecognize)
and ShowProcessedResults(AutoFormsRunResult runResult)
. These new methods will be called inside the Main()
method, below the InitFormsEngines()
method.
Add the code below to the RecognizeForm()
method to recognize the forms located within the directory represented by the filledFormDirectory
variable.
private static AutoFormsRunResult RecognizeForm(string formToRecognize)
{
string resultMessage = "Form not recognized";
Console.WriteLine("Attempting to classify {0}...", Path.GetFileName(formToRecognize));
AutoFormsRunResult runResult = autoEngine.Run(formToRecognize, null);
if (runResult != null)
{
FormRecognitionResult recognitionResult = runResult.RecognitionResult.Result;
resultMessage = $"This form has been recognized as a {runResult.RecognitionResult.MasterForm.Name} with {recognitionResult.Confidence}% confidence.\n";
}
Console.WriteLine(resultMessage);
ShowProcessedResults(runResult);
return runResult;
}
Add the code below to the ShowProcessedResults(AutoFormsRunResult runResult)
method to show the recognition results:
private static void ShowProcessedResults(AutoFormsRunResult runResult)
{
if (runResult == null)
return;
string resultsMessage = "";
try
{
foreach (FormPage formPage in runResult.FormFields)
foreach (FormField field in formPage)
if (field != null)
resultsMessage = $"{resultsMessage}{field.Name} = {(field.Result as TextFormFieldResult).Text}\n";
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
Console.WriteLine("Field Processing Results:");
if (string.IsNullOrEmpty(resultsMessage))
Console.WriteLine("No fields were processed");
else
Console.WriteLine(resultsMessage);
}
Note
The method
ShowProcessedResults(AutoFormsRunResult runResult)
is only necessary if the user wishes to see the structured fields within a form. If only the results of a form's rulesets are desired, as covered below, this method and its respective calls can be removed.
Add the calls to RecognizeForm
and ShowProcessedResults
methods to the Main()
method below the call to InitFormsEngines()
to recognize the desired forms and display the results. Main()
should now look like this:
static void Main(string[] args)
{
try
{
string projectRoot = Directory.GetParent(Environment.CurrentDirectory).Parent.Parent.FullName;
masterFormSetDirectory = Path.Combine(projectRoot, "MasterForm Sets");
filledFormDirectory = Path.Combine(projectRoot, "FilledForms");
// Startup
InitLEAD();
InitFormsEngines();
// Recognize forms
DirectoryInfo filledFormDir = new DirectoryInfo(filledFormDirectory);
FileInfo[] forms = filledFormDir.GetFiles();
Console.WriteLine("# of Forms Detected: {0}\n", forms.Length);
foreach (FileInfo form in forms)
{
string currFormName = form.FullName;
AutoFormsRunResult runResult = RecognizeForm(currFormName);
Console.WriteLine("=========================================================================");
}
// Cleanup
autoEngine.Dispose();
if (ocrEngine != null && ocrEngine.IsStarted)
ocrEngine.Shutdown();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
In the Program
class, create two new methods called GetRulesetDirectory(string masterFormName)
and RunRuleset()
. These new methods will be called from Main()
and will process any rulesets for the corresponding Master Form that are contained within the DiskMasterFormsRepository
(Includes all subfolder).
Add the following code to the new GetRulesetDirectory(string masterFormName)
method to simplify the path of the folder containing a form's ruleset:
private static string GetRulesetDirectory(string masterFormName)
{
return Path.Combine(masterFormSetDirectory, masterFormName, "Rulesets");
}
Note
This method assumes that your Master Form directory organizational structure follows the same format as the Master Form sets and Filled Form resources provided above. Adjust this method as necessary to match the structure of your given Master Form repository.
Add the following code to the new RunRuleset(string formToRecognize, string ruleset)
method to use the Document Analyzer to run each ruleset for the corresponding form type and display the results:
private static void RunRuleset(string formToRecognize, string ruleset)
{
LEADDocument document = DocumentFactory.LoadFromFile(formToRecognize, new LoadDocumentOptions());
document.Text.OcrEngine = ocrEngine;
// Create Analyzer
DocumentAnalyzer analyzer = new DocumentAnalyzer()
{
Reader = new UnstructuredDataReader(),
QueryContext = new FileRepositoryContext(ruleset)
};
DocumentAnalyzerRunOptions options = new DocumentAnalyzerRunOptions { ElementQuery = new RepositoryQuery() };
List<ElementSetResult> results = analyzer.Run(document, options);
Console.WriteLine("Ruleset Results:");
foreach (ElementSetResult result in results)
foreach (ElementResult item in result.Items)
Console.WriteLine($"{(item.GetFriendlyName())} = {(item.Value)}");
}
Add the calls to the Main()
method in order to process all of the existing rulesets for each recognized form type. Main()
should look like this:
static void Main(string[] args)
{
try
{
string projectRoot = Directory.GetParent(Environment.CurrentDirectory).Parent.Parent.FullName;
masterFormSetDirectory = Path.Combine(projectRoot, "MasterForm Sets");
filledFormDirectory = Path.Combine(projectRoot, "FilledForms");
// Startup
InitLEAD();
InitFormsEngines();
// Recognize forms
DirectoryInfo filledFormDir = new DirectoryInfo(filledFormDirectory);
FileInfo[] forms = filledFormDir.GetFiles();
Console.WriteLine("# of Forms Detected: {0}\n", forms.Length);
foreach (FileInfo form in forms)
{
string currFormName = form.FullName;
AutoFormsRunResult runResult = RecognizeForm(currFormName);
// Process rulesets for that form
DirectoryInfo rulesetDir = new DirectoryInfo(GetRulesetDirectory(runResult.RecognitionResult.MasterForm.Name));
FileInfo[] rulesets = rulesetDir.GetFiles("*.json");
foreach (FileInfo ruleset in rulesets)
{
Console.WriteLine("Running Ruleset {0}...", ruleset.Name);
RunRuleset(currFormName, ruleset.FullName);
}
Console.WriteLine("=========================================================================");
}
// Cleanup
autoEngine.Dispose();
if (ocrEngine != null && ocrEngine.IsStarted)
ocrEngine.Shutdown();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
Run the project by pressing F5, or by selecting Debug -> Start Debugging.
If the steps were followed correctly, the console appears and the application displays the recognized form along with the processed structured and unstructured fields. For this example, a 1040EZ, W4, and W9 have been included. For each, their structured and unstructured results will be displayed in the output console:
This tutorial showed how to recognize a form using the AutoFormsEngine
class, process the form's structured and unstructured fields, and display the results to the console.