The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic
The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS .NET OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Unlocking Special LEAD Features.
You can start using LEADTOOLS for .NET OCR in your application by adding a references to the Leadtools.Forms.Ocr.dll and Leadtools.Forms.DocumentWriter.dll assemblies in your .NET application. These assemblies contain the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR.
Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS.
LEADTOOLS provides methods to:
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }
Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Perform OCR processes in a single or multi-threaded environment with optimization for server-based operations.
Multiple OCR engines are supported and abstracted from the user through the use of a common .NET class library. Switching between the various engines requires virtually no changes in the application code.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones, table zones, lines, headers and footers.
Set accuracy thresholds prior to recognition to control the accuracy of recognition.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with built-in and user dictionaries.
Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.
LEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification.
The following is an outline of the general steps involved in recognizing one or more pages.
Select the engine type you wish to use and create an instance of the IOcrEngine interface. For more information, refer to Creating an OCR Engine Instance.
Startup the OCR Engine with the IOcrEngine.Startup method. For more information, refer to Starting and Shutting down the Engine.
Establish an OCR document with one or more pages. For more information, refer to Working with OCR Pages.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer to Working with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Optional. Set the spell checking language. (The default is English). For more information, refer to OCR Spell Language Dictionaries.
Optional. Set any special recognition module options. This is required only if the page contains zones, created either automatically or manually. For more information, refer to Recognizing OCR Pages, An Overview of OCR Recognition Modules and Using OMR in LEADTOOLS .NET OCR.
Recognize. For more information, refer to Recognizing OCR Pages.
Save recognition results, if desired. The results can be saved to either a file or to memory. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished. For more information, refer to Starting and Shutting down the Engine.
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
For more information on the engine assemblies refer to OcrEngineType" and Files To Be Included With Your Application.
The following example shows how to perform the above steps in code:
Visual Basic' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Advantage engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Advantage engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();
The following example shows the minimum necessary steps required to perform the same task, this time using the proper C# and VB.NET "using" keywords:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Create the document Using ocrDocument as IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' Recognize the pages ocrDocument.Pages.Recognize(Nothing) 'Save recognition results ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) End Using ' The engine will automatically shuts down when Dispose is called End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Create the OCR document using(IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) { // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // Recognize the pages ocrDocument.Pages.Recognize(null); // Save recognition results ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); } // The engine will automatically shuts down when Dispose is called }
Finally, the following example shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface:
Visual Basic' Create the engine instance Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False) ' Startup the engine ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( _ "C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", _ "C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", _ DocumentFormat.Pdf, _ Nothing, _ Nothing) End Using
C#// Create the engine instance using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false)) { // Startup the engine ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // Convert the multi-page TIF image to a PDF document ocrEngine.AutoRecognizeManager.Run( @"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", @"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null, null); }