Take the following steps to create and run a program that shows how scan a document and convert it to a searchable PDF file.
Start Visual Studio 2005 or 2008
Choose File->New->Project from the menu
In the New Project dialog box, choose either "Visual C# Projects" or "Visual Basic Projects" in the Projects Type List, and choose "Windows Application" in Visual Studio 2005 or "Windows Forms Application" in Visual Studio 2008 from the Templates List
Type the project name as "OcrTutorial3" in the Project Name field, and then choose OK. If desired, type a new location for your project or select a directory using the Browse button, and then choose OK.
In the "Solution Explorer" window, right-click on the "References" folder, and select "Add Reference..." from the context menu. In the "Add Reference" dialog box, select the ".NET" tab and browse to LEADTOOLS For .NET "\LEAD Technologies\LEADTOOLS 16.5\Bin\DotNet\Win32" folder and select the following DLLs:
Leadtools.dll Leadtools.Codecs.dll Leadtools.Forms.dll Leadtools.Forms.DocumentWriters.dll Leadtools.Forms.Ocr.dll Leadtools.Forms.Ocr.Plus.dll Leadtools.Twain.dll Leadtools.ImageProcessing.Core.dll Leadtools.Codecs.Bmp.dll Leadtools.Codecs.Cmp.dll Leadtools.Codecs.Tif.dll Leadtools.Codecs.Fax.dll
Note: The Leadtools.Codecs.*.dll references added are for the BMP, JPG, CMP, TIF and FAX image file formats. Add any additional file format codec DLL if required in your application.
Drag and drop three buttons in Form1. Leave all the buttons names as the default "button1, button2 ...", then change the Text property of each button to the following:
Button Text button1 Change output directory button2 Select the Scanner button3 Scan and OCR Switch to Form1 code view (Right-click Form1 in the solution explorer then select View Code) and add the following lines at the beginning of the file after any
Imports
orusing
section if there are any:[Visual Basic]
Imports Leadtools Imports Leadtools.Twain Imports Leadtools.Forms Imports Leadtools.Forms.DocumentWriters Imports Leadtools.Forms.Ocr Imports Leadtools.ImageProcessing.Core
[C#]
using Leadtools; using Leadtools.Twain; using Leadtools.Forms; using Leadtools.Forms.DocumentWriters; using Leadtools.Forms.Ocr; using Leadtools.ImageProcessing.Core;
Add the following private variable to the Form1 class:
[Visual Basic]
' The OCR engine instance Private _ocrEngine As IOcrEngine ' The OCR document Private _ocrDocument As IOcrDocument ' The Twain session Private _twainSession As TwainSession ' The output directory for saving PDF files Private _outputDirectory As String = "C:\MyImages" ' The image processing commands we are going to use to clean the scanned image Private deskewCmd As DeskewCommand Private despeckleCmd As DespeckleCommand Private dotRemoveCmd As DotRemoveCommand Private holePunchRemoveCmd As HolePunchRemoveCommand Private lineRemoveCmd As LineRemoveCommand
[C#]
// The OCR engine instance private IOcrEngine _ocrEngine; // The OCR document private IOcrDocument _ocrDocument; // The Twain session private TwainSession _twainSession; // The output directory for saving PDF files private string _outputDirectory = @"C:\MyImages"; // The image processing commands we are going to use to clean the scanned image private DeskewCommand deskewCmd; private DespeckleCommand despeckleCmd; private DotRemoveCommand dotRemoveCmd; private HolePunchRemoveCommand holePunchRemoveCmd; private LineRemoveCommand lineRemoveCmd;
Add the following code to the Form1 constructor (in Visual Basic, you can copy/paste the whole Sub New code from here):
[Visual Basic]
Sub New() ' This call is required by the Windows Form Designer. InitializeComponent() ' Add any initialization after the InitializeComponent() call. ' Unlock the OCR support RasterSupport.Unlock(RasterSupportType.OcrPlus, "testkey") ' Unlock the PDF save support RasterSupport.Unlock(RasterSupportType.OcrPlusPdfLeadOutput, "testkey") ' Unlock Document support RasterSupport.Unlock(RasterSupportType.Document, "testkey") ' Initialize the OCR engine _ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Plus, False) ' Startup the engine _ocrEngine.Startup(Nothing, Nothing, Nothing, Nothing) ' Create the OCR document _ocrDocument = _ocrEngine.DocumentManager.CreateDocument() ' Initalize Twain scanning session _twainSession = New TwainSession() _twainSession.Startup(Me, "My Company", "My Product", "My Version", "My Application", TwainStartupFlags.None) ' Subscribe to the TwainSession.Acquire event to get the image AddHandler _twainSession.AcquirePage, AddressOf _twainSession_AcquirePage ' Initialize the image processing commands we are going to use ' Initialize Deskew deskewCmd = New DeskewCommand() ' Initialize Despeckle despeckleCmd = New DespeckleCommand() ' Initialize DotRemove dotRemoveCmd = New DotRemoveCommand() dotRemoveCmd.Flags = _ DotRemoveCommandFlags.UseDiagonals Or _ DotRemoveCommandFlags.UseSize dotRemoveCmd.MaximumDotHeight = 8 dotRemoveCmd.MaximumDotWidth = 8 dotRemoveCmd.MinimumDotHeight = 2 dotRemoveCmd.MinimumDotWidth = 2 ' Initialize HolePunchRemove holePunchRemoveCmd = New HolePunchRemoveCommand() holePunchRemoveCmd.Flags = _ HolePunchRemoveCommandFlags.UseDpi Or _ HolePunchRemoveCommandFlags.UseCount Or _ HolePunchRemoveCommandFlags.UseLocation holePunchRemoveCmd.Location = HolePunchRemoveCommandLocation.Left ' Initialize LineRemove lineRemoveCmd = New LineRemoveCommand() lineRemoveCmd.MaximumLineWidth = 9 lineRemoveCmd.MinimumLineLength = 400 lineRemoveCmd.Wall = 15 lineRemoveCmd.MaximumWallPercent = 10 lineRemoveCmd.Variance = 3 lineRemoveCmd.GapLength = 3 End Sub
[C#]
public Form1() { InitializeComponent(); // Unlock the OCR support RasterSupport.Unlock(RasterSupportType.OcrPlus, "Replace with your own key here"); // Unlock the PDF save support RasterSupport.Unlock(RasterSupportType.OcrPlusPdfLeadOutput, "Replace with your own key here"); // Unlock Document support RasterSupport.Unlock(RasterSupportType.Document, "Replace with your own key here"); // Initialize the OCR engine _ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Plus, false); // Startup the engine _ocrEngine.Startup(null, null, null, null); // Create the OCR document _ocrDocument = _ocrEngine.DocumentManager.CreateDocument(); // Initalize Twain scanning session _twainSession = new TwainSession(); _twainSession.Startup(this, "My Company", "My Product", "My Version", "My Application", TwainStartupFlags.None); // Subscribe to the TwainSession.Acquire event to get the image _twainSession.AcquirePage += new EventHandler<TwainAcquirePageEventArgs>(_twainSession_AcquirePage); // Initialize the image processing commands we are going to use // Initialize Deskew deskewCmd = new DeskewCommand(); // Initialize Despeckle despeckleCmd = new DespeckleCommand(); // Initialize DotRemove dotRemoveCmd = new DotRemoveCommand(); dotRemoveCmd.Flags = DotRemoveCommandFlags.UseDiagonals | DotRemoveCommandFlags.UseSize; dotRemoveCmd.MaximumDotHeight = 8; dotRemoveCmd.MaximumDotWidth = 8; dotRemoveCmd.MinimumDotHeight = 2; dotRemoveCmd.MinimumDotWidth = 2; // Initialize HolePunchRemove holePunchRemoveCmd = new HolePunchRemoveCommand(); holePunchRemoveCmd.Flags = HolePunchRemoveCommandFlags.UseDpi | HolePunchRemoveCommandFlags.UseCount | HolePunchRemoveCommandFlags.UseLocation; holePunchRemoveCmd.Location = HolePunchRemoveCommandLocation.Left; // Initialize LineRemove lineRemoveCmd = new LineRemoveCommand(); lineRemoveCmd.MaximumLineWidth = 9; lineRemoveCmd.MinimumLineLength = 400; lineRemoveCmd.Wall = 15; lineRemoveCmd.MaximumWallPercent = 10; lineRemoveCmd.Variance = 3; lineRemoveCmd.GapLength = 3; }
Override the Form1 closed event to add the code necessary to shutdown the OCR engine when the application terminates:
[Visual Basic]
Protected Overrides Sub OnFormClosed(ByVal e As FormClosedEventArgs) ' Destroy the OCR document _ocrDocument.Dispose() ' Shutdown and dispose the OCR engine _ocrEngine.Dispose() ' Close the Twain session _twainSession.Shutdown() MyBase.OnFormClosed(e) End Sub
[C#]
protected override void OnFormClosed(FormClosedEventArgs e) { // Destroy the OCR document _ocrDocument.Dispose(); // Shutdown and dispose the OCR engine _ocrEngine.Dispose(); // Close the Twain session _twainSession.Shutdown(); base.OnFormClosed(e); }
Add the following code for the button1 (Change output directory) control’s
Click
handler:[Visual Basic]
Private Sub button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles button1.Click ' Change the output directory Dim dlg As New FolderBrowserDialog() dlg.SelectedPath = _outputDirectory dlg.ShowNewFolderButton = True If (dlg.ShowDialog(Me) = DialogResult.OK) Then _outputDirectory = System.IO.Path.GetFullPath(dlg.SelectedPath) End If End Sub
[C#]
private void button1_Click(object sender, EventArgs e) { // Change the output directory FolderBrowserDialog dlg = new FolderBrowserDialog(); dlg.SelectedPath = _outputDirectory; dlg.ShowNewFolderButton = true; if(dlg.ShowDialog(this) == DialogResult.OK) _outputDirectory = System.IO.Path.GetFullPath(dlg.SelectedPath); }
Add the following code for the button2 (Select the Scanner) control’s
Click
handler:[Visual Basic]
Private Sub button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles button2.Click ' Select the scanner to use _twainSession.SelectSource(Nothing) End Sub
[C#]
private void button2_Click(object sender, EventArgs e) { // Select the scanner to use _twainSession.SelectSource(null); }
Add the following code for the button3 (Scan and OCR) control’s
Click
handler:[Visual Basic]
Private Sub button3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles button3.Click ' Create the output directory if it does not exist If (Not System.IO.Directory.Exists(_outputDirectory)) Then System.IO.Directory.CreateDirectory(_outputDirectory) End If ' Build the output PDF file name Dim pdfFileName As String = System.IO.Path.Combine(_outputDirectory, "Scanned.pdf") ' First remove all the pages added to the OCR document _ocrDocument.Pages.Clear() ' Scan the new page(s) _twainSession.Acquire(TwainUserInterfaceFlags.Show) ' The pages should be added to the OCR document now. ' Recognize and save as PDF _ocrDocument.Pages.Recognize(Nothing) _ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, Nothing) ' Show the result PDF file System.Diagnostics.Process.Start(pdfFileName) End Sub
[C#]
private void button3_Click(object sender, EventArgs e) { // Create the output directory if it does not exist if(!System.IO.Directory.Exists(_outputDirectory)) System.IO.Directory.CreateDirectory(_outputDirectory); // Build the output PDF file name string pdfFileName = System.IO.Path.Combine(_outputDirectory, "Scanned.pdf"); // First remove all the pages added to the OCR document _ocrDocument.Pages.Clear(); // Scan the new page(s) _twainSession.Acquire(TwainUserInterfaceFlags.Show); // The pages should be added to the OCR document now. // Recognize and save as PDF _ocrDocument.Pages.Recognize(null); _ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, null); // Show the result PDF file System.Diagnostics.Process.Start(pdfFileName); }
Add the private method to handle the
AcquirePage
event of theTwainSession
object:[Visual Basic]
Private Sub _twainSession_AcquirePage(ByVal sender As Object, ByVal e As TwainAcquirePageEventArgs) ' We have a page Dim image As RasterImage = e.Image ' First, run the image processing commands on it ' Deskew deskewCmd.Run(image) ' Despeckle despeckleCmd.Run(image) ' The rest of the commands only work on 1 BPP image If (image.BitsPerPixel = 1) Then ' Dot Remove dotRemoveCmd.Run(image) ' Hole Punch Remove holePunchRemoveCmd.Run(image) ' Vertical Line Remove lineRemoveCmd.Type = LineRemoveCommandType.Vertical lineRemoveCmd.Run(image) ' Horizontal Line Remove lineRemoveCmd.Type = LineRemoveCommandType.Horizontal lineRemoveCmd.Run(image) End If ' Add the image as a new page to the OCR document _ocrDocument.Pages.AddPage(image, Nothing) End Sub
[C#]
private void _twainSession_AcquirePage(object sender, TwainAcquirePageEventArgs e) { // We have a page RasterImage image = e.Image; // First, run the image processing commands on it // Deskew deskewCmd.Run(image); // Despeckle despeckleCmd.Run(image); // The rest of the commands only work on 1 BPP image if(image.BitsPerPixel == 1) { // Dot Remove dotRemoveCmd.Run(image); // Hole Punch Remove holePunchRemoveCmd.Run(image); // Vertical Line Remove lineRemoveCmd.Type = LineRemoveCommandType.Vertical; lineRemoveCmd.Run(image); // Horizontal Line Remove lineRemoveCmd.Type = LineRemoveCommandType.Horizontal; lineRemoveCmd.Run(image); } // Add the image as a new page to the OCR document _ocrDocument.Pages.AddPage(image, null); }
Reference
OCR Tutorial - Working with PagesOCR Tutorial - Recognizing Pages
OCR Tutorial - Adding and Painting Zones
OCR Tutorial - Working with Recognition Results
Introduction
Getting Started (Guide to Example Programs)
LEADTOOLS OCR .NET Assemblies
Programming with LEADTOOLS .NET OCR
An Overview of OCR Recognition Modules
Creating an OCR Engine Instance
Starting and Shutting Down the OCR Engine
OCR Spell Language Dictionaries
Working with OCR Languages
Working With OCR User Dictionaries
Working with OCR Pages
Working with OCR Zones
Recognizing OCR Pages
OCR Confidence Reporting
Using OMR in LEADTOOLS .NET OCR
OCR Languages and Spell Checking
OCR Engine-Specific Settings