This tutorial shows how to perform basic operations using the LEADTOOLS Document Analyzer SDK in a C# .NET 6 application.
Overview | |
---|---|
Summary | This tutorial shows how to use and perform basic DocumentAnalyzer operations. |
Completion Time | 20 minutes |
Visual Studio Project | Download tutorial project (5 KB) |
Platform | C# .NET 6 Console Application |
IDE | Visual Studio 2022 |
Runtime Target | .NET 6 or Higher |
Development License | Download LEADTOOLS |
Get familiar with the basic steps of creating a project by reviewing the Add References and Set a License tutorial, before working on this tutorial.
Start with a copy of the project created in the Add References and Set a License tutorial. If you do not have that project, follow the steps in that tutorial to create it.
The references needed depend upon the purpose of the project. References can be added by one or the other of the following two methods (but not both). For this project, the following references are needed:
If using NuGet references, this tutorial requires the following NuGet package:
Leadtools.Document.Sdk
If using local DLL references, the following DLLs are needed.
The DLLs are located at <INSTALL_DIR>\LEADTOOLS23\Bin\net
:
Leadtools.dll
Leadtools.Core.dll
Leadtools.Codecs.dll
Leadtools.Document.dll
Leadtools.Document.Analytics.dll
Leadtools.Document.Unstructured.dll
Leadtools.Ocr.dll
Leadtools.Ocr.LEADEngine.dll
For a complete list of which DLL files are required for your application, refer to Files to be Included With Your Application.
The License unlocks the features needed for the project. It must be set before any toolkit function is called. For details, including tutorials for different platforms, refer to Setting a Runtime License.
There are two types of runtime licenses:
With the project created, the references added, and the license set, coding can begin.
In Solution Explorer, open Program.cs
. Add the following statements to the using
block at the top of Program.cs
:
using `<PROJECT_NAME>`.Tutorials;
using Leadtools;
using System;
using System.IO;
Right-click on <PROJECT_NAME>.csproj
and select Add -> New Folder. Name the folder Tutorials
. This folder will contain six classes showcasing various features of high-level Document Analyzer API. To add a new class to the Tutorials
folder, right-click the folder and select Add -> New Item. Select Class and name the class. Add the six classes in the table below.
Class Name | Description |
---|---|
SaveLoad.cs | Create sample features, save to JSON, and load JSON. |
StandardFeatures.cs | Create a standard date feature. |
CustomFeatures.cs | Create a custom sample feature. |
ExcludedFeatures.cs | Find emails with one exclusion. |
ExecuteFeatures.cs | Create an engine to execute features. |
LabeledFeatures.cs | Find emails and add a feature label. |
Add the code below to the Main()
method to run the various features highlighted in the newly created classes.
static void Main(string[] args)
{
InitLEAD();
// Run the tutorial samples
SaveLoad.Run();
StandardFeatures.Run();
CustomFeatures.Run();
ExcludedFeatures.Run();
LabeledFeatures.Run();
ExecuteFeatures.Run();
}
In Solution Explorer, open SaveLoad.cs
. Add the following statements to the using
block at the top:
using Leadtools.Document.Unstructured.Highlevel;
using System.Collections.Generic;
Create a new Run()
method to the SaveLoad
class. Add the code to the Run()
method to execute the features in this class.
public static void Run()
{
// Create sample features
var feature = SampleFeature();
// Save to json
var json = feature.ToJson();
// Load from json
var loaded = FeatureResourceBuilder.Build(json);
}
Add a new method named SampleFeature()
, which will return each IFeature
object called from the Run()
method. IFeature
is the base abstract class for features created to extract form information using automated unstructured forms processing.
Add the code below to the SampleFeature()
class to create a custom sample feature.
private static IFeature SampleFeature()
{
// Create a sample custom feature
var sample = new CustomFeature()
{
Name = "Sample",
Value = new List<InfoValue>() { new InfoValue() { Tweaks = new RegexTweaks(), TweaksForResults = new RegexResultsTweaks(), Pattern = @"\d" } }
};
return sample;
}
In Solution Explorer, open StandardFeatures.cs
. Add the following statements to the using
block at the top:
using Leadtools.Document.Unstructured.Highlevel;
using System.Collections.Generic;
Create a new Run()
method to the StandardFeatures
class. Add the code to the Run()
method to execute the features in this class.
public static void Run()
{
// Date
var std_feature = StandardDate();
// All features
var std_all = AllStandardFeatures();
}
Add two new methods named StandardDate()
and AllStandardFeatures()
. Both of these methods are called inside the Run()
method, to return the IFeature(s)
for data extraction.
Add the code below to the StandardDate()
method to create a standard date feature.
private static IFeature StandardDate()
{
// Standard date feature
var std_date = new StandardFeature() { ValueName = "Date", Name = "Tutorial_Date" };
return std_date;
}
Add the code below to the StandardDate()
method to create a list of features from all the regex expressions in the built-in database.
private static IEnumerable<IFeature> AllStandardFeatures()
{
foreach (var value in RegexExpressionDb.List("value"))
{
var std = new StandardFeature() { ValueName = value, Name = value };
yield return std;
}
}
In Solution Explorer, open CustomFeatures.cs
. Add the following statements to the using
block at the top:
using Leadtools.Document.Unstructured.Highlevel;
using System.Collections.Generic;
Create a new Run()
method to the CustomFeatures
class. Add the code to the Run()
method to execute the features in this class.
public static void Run()
{
// Custom feature to find (demo) banking account number
var custom_feature = Account();
}
Add a new method named Account()
. Add the code below to the Account()
method to return the feature created to find the bank account number.
public static IFeature Account()
{
var acc = new CustomFeature() { Name = "Account" };
acc.Label = new List<InfoLabel>()
{
new InfoLabel()
{
Value = new InfoValue()
{
Pattern="account(\\s)?number",
Tweaks=new RegexTweaks()
{
IgnoreCase=true,
IgnoreWhiteSpace=false,
FuzzyMatching=FuzzyMatching.Auto,
IgnoreIfShorterThan=8,
LettersToNumbers=false,
MatchWholeWord=false,
},
TweaksForResults=new RegexResultsTweaks()
{
IncludeWholeWord=false,
IncludeWholeLine=false,
}
},
Where = ECLocation.Right,
LocationProximity=5,
},
new InfoLabel()
{
Value = new InfoValue()
{
Pattern="loan(\\s)?number",
Tweaks=new RegexTweaks()
{
IgnoreCase=true,
IgnoreWhiteSpace=false,
FuzzyMatching=FuzzyMatching.Auto,
IgnoreIfShorterThan=8,
LettersToNumbers=false,
MatchWholeWord=false,
},
TweaksForResults=new RegexResultsTweaks()
{
IncludeWholeWord=false,
IncludeWholeLine=false,
}
},
Where = ECLocation.Right,
LocationProximity =5,
},
new InfoLabel()
{
Value = new InfoValue()
{
Pattern="brokerage(\\s)?cash(\\s)?number",
Tweaks=new RegexTweaks()
{
IgnoreCase=true,
IgnoreWhiteSpace=false,
FuzzyMatching=FuzzyMatching.Auto,
IgnoreIfShorterThan=10,
LettersToNumbers=false,
MatchWholeWord=false,
},
TweaksForResults=new RegexResultsTweaks()
{
IncludeWholeWord=false,
IncludeWholeLine=false,
}
},
Where = ECLocation.Right,
LocationProximity =5,
},
new InfoLabel()
{
Value = new InfoValue()
{
Pattern="Account(\\s)No.",
Tweaks=new RegexTweaks()
{
IgnoreCase=false,
IgnoreWhiteSpace=false,
FuzzyMatching=FuzzyMatching.Auto,
IgnoreIfShorterThan=8,
LettersToNumbers=false,
MatchWholeWord=false,
},
TweaksForResults=new RegexResultsTweaks()
{
IncludeWholeWord=false,
IncludeWholeLine=false,
}
},
Where = ECLocation.Right,
LocationProximity =5,
},
};
acc.Value = new List<InfoValue>()
{
new InfoValue()
{
Pattern = "\\d{3,4}(-)?\\d{3,14}",
Tweaks = new RegexTweaks()
{
FuzzyMatching=FuzzyMatching.Auto,
IgnoreCase=true,
IgnoreWhiteSpace=true,
IgnoreIfShorterThan=5,
LettersToNumbers=true,
MatchWholeWord=false,
},
TweaksForResults = new RegexResultsTweaks()
{
IncludeWholeWord=true,
IncludeWholeLine=false
}
},
new InfoValue()
{
Pattern = "\\d{3,4}(-|//s)?\\d{3,6}(-|//s)?\\d{3,6}",
Tweaks = new RegexTweaks()
{
FuzzyMatching=FuzzyMatching.Auto,
IgnoreCase=true,
IgnoreWhiteSpace=true,
IgnoreIfShorterThan=5,
LettersToNumbers=true,
MatchWholeWord=false,
},
TweaksForResults = new RegexResultsTweaks()
{
IncludeWholeWord=true,
IncludeWholeLine=false
}
}
};
return acc;
}
In Solution Explorer, open ExcludedFeatures.cs
. Add the following statements to the using
block at the top:
using Leadtools.Document.Unstructured.Highlevel;
using System.Collections.Generic;
Create a new Run()
method to the ExcludedFeatures
class. Add the code to the Run()
method to execute the features in this class.
public static void Run()
{
// Feature to find emails excluding "info@leadtools.com"
var features = new List<IFeature>()
{
// Emails matching
new StandardFeature(){ValueName="Email"},
// Excluding the exact email below
ExcludeExact("info@leadtools.com")
};
// Now we have a list of features, if executed, it will match all emails except for info@leadtools.com
}
Add a new method named ExcludeExact()
to the ExcludedFeatures
class. This method will be called in the Run()
method above. Add the below code to the new method to add a feature that finds emails, excluding emails that are listed in the Run()
method.
private static IFeature ExcludeExact(string text)
{
var ex = new CustomFeature() { Name = "Excluded" };
ex.Value = new List<InfoValue>()
{
new InfoValue()
{
Pattern = text,
PatternIsRegex = false,
Tweaks = new RegexTweaks(),
TweaksForResults = new RegexResultsTweaks()
}
};
ex.Excluded = true;
return ex;
}
In Solution Explorer, open ExecuteFeatures.cs
. Add the following statements to the using
block at the top:
using System.Collections.Generic;
using System.Threading;
using Leadtools.Document;
using Leadtools.Document.Unstructured.Highlevel;
Create a new Run()
method to the ExecuteFeatures
class. This class is used to show how to run the FeaturesProcessingEngine
to extract data from a loaded document based on the created features. Add the code to the Run()
method to execute the features in this class.
public async static void Run()
{
// Custom feature to find (demo) banking account number
var custom_feature = Account();
// Load a target document
var doc_file_name = @"INSERT FILE PATH TO TARGET DOCUMENT";
var Document = DocumentFactory.LoadFromFile(doc_file_name, new LoadDocumentOptions());
// Create engine to run and execute features
var engine = new FeaturesProcessingEngine(true);
var results = await engine.Run(new List<IFeature>() { custom_feature }, Document, CancellationToken.None);
}
The Account()
method used to test the sample document in the Run()
method, is the same Account()
method in the CustomFeature
class, so use that code to add to the ExecuteFeatures
class.
In Solution Explorer, open LabeledFeatures.cs
. Add the following statements to the using
block at the top:
using Leadtools.Document.Unstructured.Highlevel;
using System.Collections.Generic;
Create a new Run()
method to the LabeledFeatures
class. Add the code to the Run()
method to execute the features in this class.
public static void Run()
{
// Feature for Emails matching
var feature = new StandardFeature() { ValueName = "Email" };
// Add label
AddLabel(feature, "email:");
}
Add a new method to the LabeledFeatures
class named AddLabel(StandardFeature feature, string labelText)
. Add the code below to the new method to create a custom label for a custom or standard feature.
private static void AddLabel(StandardFeature feature, string labelText)
{
feature.CustomLabel = true;
feature.CustomLabels = new List<InfoLabel>()
{
new InfoLabel()
{
Value = new InfoValue()
{
Tweaks = new RegexTweaks(),
TweaksForResults = new RegexResultsTweaks(),
// Exact matching label text
Pattern = labelText,
PatternIsRegex = false,
},
// Location
Where = ECLocation.Right,
// Proximity
LocationProximity = 5,
},
};
}
Run the project by pressing F5, or by selecting Debug -> Start Debugging.
If the steps were followed correctly, the console appears and the application will execute the code for each sample feature class. To test the ExecuteFeatures
class code, ensure that you change the file path to the string value of your test document.
This tutorial showed how to use the LEADTOOLS Document Analyzer to perform high-level API operations.