IOcrAutoRecognizeManager Interface

Summary

Provides support for the one shot "fire and forget" approach to OCR suitable for unattended recognition.

Syntax

Objective-C

C++

Java

public interface IOcrAutoRecognizeManager

Public Interface IOcrAutoRecognizeManager

@interface LTOcrAutoRecognizeManagerJobError : NSObject

public class OcrAutoRecognizeManager

public interface class IOcrAutoRecognizeManager

Remarks

You can access the instance of the IOcrAutoRecognizeManager used by an IOcrEngine through the IOcrEngine.AutoRecognizeManager property.

The members of this interface will let you create a document from an image file on disk with optional progress and status monitors.

You can use the Run methods to convert in one line of code an image on disk to a final document with any of the document formats supported by this IOcrEngine.

You can also create jobs using the CreateJob method and then run them synchronously through RunJob or asynchronously through RunJobAsync.

The IOcrAutoRecognizeManager interface also has the following options to use with the Run, RunJob and RunJobAsync methods:

Member	Description
PreprocessPageCommands	Holds an array of OcrAutoPreprocessPageCommand items to control what auto-preprocess operation to perform on each page document prior to recognition.
JobErrorMode	Ability to resume on none critical errors. For example, if a source document has a page that could not be recognized. The offending page will be added to the final document as a graphics images and recognition will continue to the next page.
JobStarted, JobProgress, JobOperation and JobCompleted events	Events to track when both synchronous and asynchronous jobs has started, being run and completed.
AbortAllJobs	Aborts all running and pending jobs.
EnableTrace	Output debug messages to the standard .NET trace listeners.
MaximumPagesBeforeLtd	Important: not used by the LEADTOOLS OCR Module - LEAD Engine. Add support for converting a document with unlimited number of pages if the engine does not support it. An OCR recognition operation on a document that contains a large amount of pages (10 and more) might result in an out of memory error. All of the LEADTOOLS OCR engines supports saving the intermediate recognition results to a temporary LTD file (DocumentFormat.LTD). The result of subsequent pages will be appended to this temporary file. When all the pages of the document have been recognized, the engine will convert the temporary LTD file to the desired output format. The LEADTOOLS OCR Module - LEAD Engine handles this operation internally by using a file-based document and does not load more than one page in memory at a time and will not use the value of MaximumPagesBeforeLtd. For the other engines, the MaximumPagesBeforeLtd property defines the maximum number of pages processed as a whole. For example, if the original document has 20 pages and the value of this property is 8, the engine will recognize the first 8 pages and saves the result to a temporary file, recognizes the second 8 pages and append the results, and finally, recognize the last 4 pages and convert the temporary document into the final format.
MaximumThreadsPerJob	Maximum number of threads to use per job. You can instruct IOcrAutoRecognizeManager to use all available machine CPUs/cores when recognizing a document. This will greatly reduce the time required to finish the OCR operation. The LEADTOOLS OCR Module - LEAD Engine uses the system thread pool and does not require a set number of threads. A value of 1 will disable threading and any other value will be treated as "use multi-threading".

Some OCR engine types support creating multi-threaded documents by creating one IOcrEngine and multiple IOcrDocument or IOcrAutoRecognizeJob each in its own dedicated threads. For more information, refer to Multi-Threading with LEADTOOLS OCR.

Example

This example will convert TIF files in a source folder to PDF in a destination folder

For an example on how to run multiple jobs simultaneously in multiple threads with synchronization and aborting support, refer to RunJob.

using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Ocr; 
using Leadtools.Document.Writer; 
using Leadtools.Forms.Common; 
using Leadtools.WinForms; 
 
public void OcrAutoRecognizeManagerExample() 
{ 
   Console.WriteLine("Preparing the source and destination directories..."); 
 
   string sourceDirectory = LEAD_VARS.ImagesDir; 
   string destinationDirectory = Path.Combine(LEAD_VARS.ImagesDir, "AutoRecognizeManagerExample"); 
 
   // Prepare the output directory 
   if (!Directory.Exists(destinationDirectory)) 
   { 
      Directory.CreateDirectory(destinationDirectory); 
   } 
 
   // OCR some images from the source directory into the destination directory: 
   IList<string> imageFiles = new List<string>(); 
 
   for (int i = 1; i <= 4; i++) 
   { 
      imageFiles.Add(Path.Combine(sourceDirectory, string.Format("Ocr{0}.tif", i))); 
   } 
 
   Console.WriteLine("Creating an instance of the engine..."); 
 
   // Create an instance of the engine 
   using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD, false)) 
   { 
      // Start the engine using default parameters 
      Console.WriteLine("Starting up the engine..."); 
      ocrEngine.Startup(null, null, null, LEAD_VARS.OcrLEADRuntimeDir); 
 
      IOcrAutoRecognizeManager ocrAutoRecognizeManager = ocrEngine.AutoRecognizeManager; 
 
      // Use LTD as a temporary format if a document has more than 4 pages to save memory 
      ocrAutoRecognizeManager.MaximumPagesBeforeLtd = 4; 
 
      // Use maximum CPUs/cores of current machine to speed up recognition 
      // Either passing 0 or System.Environment.ProcessorCount 
      ocrAutoRecognizeManager.MaximumThreadsPerJob = 0; 
 
      // Deskew and auto-orient all pages before recognition 
      ocrAutoRecognizeManager.PreprocessPageCommands.Clear(); 
      ocrAutoRecognizeManager.PreprocessPageCommands.Add(OcrAutoPreprocessPageCommand.Deskew); 
      ocrAutoRecognizeManager.PreprocessPageCommands.Add(OcrAutoPreprocessPageCommand.Rotate); 
 
      // Create PDFs with Image/Text option 
      PdfDocumentOptions pdfOptions = ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf) as PdfDocumentOptions; 
      pdfOptions.ImageOverText = true; 
      ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions); 
 
      // Loop through all the TIF files in the source directory, convert to PDF in the destination directory 
      foreach (string imageFile in imageFiles) 
      { 
         // Construct the name of the document file 
         string documentFileName = Path.Combine(destinationDirectory, Path.GetFileNameWithoutExtension(imageFile)); 
         documentFileName = Path.ChangeExtension(documentFileName, "pdf"); 
 
         // OCR the file 
         Console.WriteLine("Processing {0}", imageFile); 
         ocrAutoRecognizeManager.Run(imageFile, documentFileName, DocumentFormat.Pdf, null, null); 
         Console.WriteLine("Saved: {0}", documentFileName); 
      } 
   } 
} 
 
static class LEAD_VARS 
{ 
   public const string ImagesDir = @"C:\Users\Public\Documents\LEADTOOLS Images"; 
   public const string OcrLEADRuntimeDir = @"C:\LEADTOOLS 20\Bin\Common\OcrLEADRuntime"; 
}

Imports Leadtools 
Imports Leadtools.Codecs 
Imports Leadtools.Ocr 
Imports Leadtools.Document.Writer 
Imports Leadtools.Forms.Common 
Imports Leadtools.WinForms 
 
Public Sub OcrAutoRecognizeManagerExample() 
   Console.WriteLine("Preparing the source and destination directories...") 
 
   Dim sourceDirectory As String = LEAD_VARS.ImagesDir 
   Dim destinationDirectory As String = Path.Combine(LEAD_VARS.ImagesDir, "AutoRecognizeManagerExample") 
 
   ' Prepare the output directory 
   If Not Directory.Exists(destinationDirectory) Then 
      Directory.CreateDirectory(destinationDirectory) 
   End If 
 
   ' OCR some images from the source directory into the destination directory: 
   Dim imageFiles As IList(Of String) = New List(Of String)() 
 
   For i As Integer = 1 To 4 
      imageFiles.Add(Path.Combine(sourceDirectory, String.Format("Ocr{0}.tif", i))) 
   Next 
 
   Console.WriteLine("Creating an instance of the engine...") 
 
   ' Create an instance of the engine 
   Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD, False) 
      ' Start the engine using default parameters 
      Console.WriteLine("Starting up the engine...") 
      ocrEngine.Startup(Nothing, Nothing, Nothing, LEAD_VARS.OcrLEADRuntimeDir) 
 
      Dim ocrAutoRecognizeManager As IOcrAutoRecognizeManager = ocrEngine.AutoRecognizeManager 
 
      ' Use LTD as a temporary format if a document has more than 4 pages to save memory 
      ocrAutoRecognizeManager.MaximumPagesBeforeLtd = 4 
 
      ' Use maximum CPUs/cores of current machine to speed up recognition 
      ' Either passing 0 or System.Environment.ProcessorCount 
      ocrAutoRecognizeManager.MaximumThreadsPerJob = 0 
 
      ' Deskew and auto-orient all pages before recognition 
      ocrAutoRecognizeManager.PreprocessPageCommands.Clear() 
      ocrAutoRecognizeManager.PreprocessPageCommands.Add(OcrAutoPreprocessPageCommand.Deskew) 
      ocrAutoRecognizeManager.PreprocessPageCommands.Add(OcrAutoPreprocessPageCommand.Rotate) 
 
      ' Create PDFs with Image/Text option 
      Dim pdfOptions As PdfDocumentOptions = TryCast(ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf), PdfDocumentOptions) 
      pdfOptions.ImageOverText = True 
      ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions) 
 
      ' Loop through all the TIF files in the source directory, convert to PDF in the destination directory 
      For Each imageFile As String In imageFiles 
         ' Construct the name of the document file 
         Dim documentFileName As String = Path.Combine(destinationDirectory, Path.GetFileNameWithoutExtension(imageFile)) 
         documentFileName = Path.ChangeExtension(documentFileName, "pdf") 
 
         ' OCR the file 
         Console.WriteLine("Processing {0}", imageFile) 
         ocrAutoRecognizeManager.Run(imageFile, documentFileName, DocumentFormat.Pdf, Nothing, Nothing) 
         Console.WriteLine("Saved: {0}", documentFileName) 
      Next 
   End Using 
End Sub 
 
Public NotInheritable Class LEAD_VARS 
   Public Const ImagesDir As String = "C:\Users\Public\Documents\LEADTOOLS Images" 
   Public Const OcrLEADRuntimeDir As String = "C:\LEADTOOLS 20\Bin\Common\OcrLEADRuntime" 
End Class