RunJob Method

Summary

Runs a job

Syntax

Objective-C

C++/CLI

Java

Python

public OcrAutoRecognizeManagerJobStatus RunJob( 
   IOcrAutoRecognizeJob job 
)

- (LTOcrAutoRecognizeManagerJobStatus)runJob:(LTOcrAutoRecognizeJob *)job error:(NSError **)error

public OcrAutoRecognizeManagerJobStatus runJob(OcrAutoRecognizeJob job)

OcrAutoRecognizeManagerJobStatus RunJob(  
   IOcrAutoRecognizeJob^ job 
)

def RunJob(self,job):

Parameters

job
The IOcrAutoRecognizeJob to run this parameter cannot be null (Nothing in VB). Use CreateJob to create a job.

Return Value

An OcrAutoRecognizeManagerJobStatus enumeration member that determines whether the job was completed successfully or aborted due to errors or user action.

Remarks

If you call this method from the same thread that created IOcrAutoRecognizeManager, then the current thread will block till this method returns. To run a job asynchronously, use RunJobAsync.

When this method returns, the IOcrAutoRecognizeJob.Errors member of job will contain any errors that might have occurred during the recognition process.

To use this method, initialize a new OcrAutoRecognizeJobData object with the job's parameters (input image file name, pages, output document format, output document name, optional zones file name, etc.), then use CreateJob to create the IOcrAutoRecognizeJob object passed as job to this method. Finally, call RunJob passing the IOcrAutoRecognizeJob object.

This method will perform the following operations:

The JobStarted event is triggered.
Creates one or more IOcrDocument object to store the pages into. The number of OCR documents created is dependent on MaximumThreadsPerJob. If this value is 0 (maximum CPUs/cores) or is greater than 1 and multiple threads is supported by this engine, then more than one document might be created to participate in the recognition process. The document will be created as disk-based.
Loops through the pages specified in OcrAutoRecognizeJobData.FirstPageNumber and in OcrAutoRecognizeJobData.LastPageNumber in OcrAutoRecognizeJobData.ImageFileName or OcrAutoRecognizeJobData.ImageStream and for each page:
- The page is created using IOcrEngine.CreatePage.
- If OcrAutoRecognizeJobData.ZonesFileName contains a valid multipage zone file name and has an entry for the current page, then the zones are loaded with IOcrPage.LoadZones(fileName, pageNumber) and applied to the page. If OcrAutoRecognizeJobData.ZonesFileName is a null (Nothing in VB) reference or it does not contain an equivalent page number, auto-decomposing of the page is performed instead with IOcrPage.AutoZone. A valid multipage zone file that has entries for all document pages is generated by saving document zones using IOcrDocument.SaveZones not IOcrPage.SaveZones since IOcrPage.SaveZones does not save the page number.
- IOcrPage.Recognize is called to get the OCR data of the page.
- For LEADTOOLS OCR Module - LEAD Engine, the page is added to the document using IOcrDocument.Pages.Add.
- For other engines that LEADTOOLS OCR Module - LEAD Engine: If multiple documents are used or current number of recognized pages is greater than the maximum specified in MaximumPagesBeforeLtd, then current recognition data is saved to a temporary LTD file and the OCR document is cleared.
When all pages are processed they are saved to result file name specified in OcrAutoRecognizeJobData.DocumentFileName using the format specified OcrAutoRecognizeJobData.Format If LTD was used, the temporary file is converted to the final document using DocumentWriter.Convert and optionally DocumentWriter.AppendLtd.
All OCR documents and temporary files are deleted.
The JobCompleted event is triggered.
You can use the JobProgress event to show the operation progress or to abort it if threading is not used. For more information and an example, refer to OcrProgressCallback.
You can use the JobOperation event to get information regarding the current operation being performed. For more information and an example, refer to JobOperation.

The IOcrAutoRecognizeManager interface also has the following options to use with this method:

Option	Description
MaximumPagesBeforeLtd	Add support for converting a document with unlimited number of pages. An OCR recognition operation on a document that contains a large amount of pages (10 and more) might result in an out of memory error. All of the LEADTOOLS OCR engines supports saving the intermediate recognition results to a temporary LTD file (DocumentFormat.LTD). The result of subsequent pages will be appended to this temporary file. When all the pages of the document have been recognized, the engine will convert the temporary LTD file to the desired output format. The MaximumPagesBeforeLtd property defines the maximum number of pages processed as a whole. For example, if the original document has 20 pages and the value of this property is 8, the engine will recognize the first 8 pages and saves the result to a temporary file, recognizes the second 8 pages and append the results, and finally, recognize the last 4 pages and convert the temporary document to the final format.
PreprocessPageCommands	Holds an array of OcrAutoPreprocessPageCommand items to control what auto-preprocess operation to perform on each page document prior to recognition.
MaximumThreadsPerJob	Maximum number of threads to use per job. You can instruct IOcrAutoRecognizeManager to use all available machine CPUs/cores when recognizing a document. This will greatly reduce the time required to finish the OCR operation.
JobErrorMode	Ability to resume on none critical errors. For example, if a source document has a page that could not be recognized. The offending page will be added to the final document as a graphics images and recognition will continue to the next page.
JobStarted, JobProgress, JobOperation and JobCompleted events	Events to track when both synchronous and asynchronous jobs has started, being run and completed.
AbortAllJobs	Aborts all running and pending jobs.
EnableTrace	Output debug messages to the standard .NET trace listeners.

Example

This example will OCR all the images in a given folder and convert them to PDF documents. It uses multiple threads to maximize the recognition performance, supports abortion and continues on non-critical errors. This example supports converting images of any number of pages.

Java

using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Ocr; 
using Leadtools.Document.Writer; 
using Leadtools.Forms.Common; 
using Leadtools.WinForms; 
 
public class RunJobExample 
{ 
   // Number of documents that are pending 
   private int _documentsPending; 
   // Event to trigger when all documents are finished 
   private AutoResetEvent _allDocumentsFinishedEvent; 
 
      public void Start() 
   { 
      string imagesDirectory = LEAD_VARS.ImagesDir; 
      string documentsDirectory = Path.Combine(LEAD_VARS.ImagesDir, "RunJobExample"); 
 
      // Create the output (documents) directory 
      if (!Directory.Exists(documentsDirectory)) 
      { 
         Directory.CreateDirectory(documentsDirectory); 
      } 
 
      // Get all TIF files in input (images) directory 
      string[] imageFileNames = Directory.GetFiles(imagesDirectory, "*.tif"); 
      if (imageFileNames.Length == 0) 
      { 
         Console.WriteLine("No images to OCR"); 
         return; 
      } 
 
      // Create a new OCR engine instance 
      OcrEngineType engineType = OcrEngineType.LEAD; 
      Console.WriteLine(string.Format("Starting up {0} engine", engineType)); 
      using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(engineType)) 
      { 
         ocrEngine.Startup(null, null, null, LEAD_VARS.OcrLEADRuntimeDir); 
 
         // Setup document PDF save options: Image/Text with CCITT G4 encoding for B/W 
         DocumentWriter docWriter = ocrEngine.DocumentWriterInstance; 
         PdfDocumentOptions pdfOptions = docWriter.GetOptions(DocumentFormat.Pdf) as PdfDocumentOptions; 
         pdfOptions.ImageOverText = true; 
         pdfOptions.DocumentType = PdfDocumentType.Pdf; 
         pdfOptions.FontEmbedMode = DocumentFontEmbedMode.None; 
         pdfOptions.OneBitImageCompression = OneBitImageCompressionType.FaxG4; 
         docWriter.SetOptions(DocumentFormat.Pdf, pdfOptions); 
 
         // We are going to use multiple-threads, so disable threading in 
         // IOcrAutoRecognizeManager 
         IOcrAutoRecognizeManager autoRecognizeManager = ocrEngine.AutoRecognizeManager; 
         autoRecognizeManager.MaximumThreadsPerJob = 1; 
 
         // Tell the recognize manager to continue on errors 
         autoRecognizeManager.JobErrorMode = OcrAutoRecognizeManagerJobErrorMode.Continue; 
 
         // Instead of using events to trigger when documents are done, 
         // we will use the JobCompleted events of IOcrAutoRecognizeManager 
         // to decrement a counter and trigger one event when the counter reaches 0 
         autoRecognizeManager.JobStarted += new EventHandler<OcrAutoRecognizeRunJobEventArgs>(autoRecognizeManager_JobStarted); 
         autoRecognizeManager.JobCompleted += new EventHandler<OcrAutoRecognizeRunJobEventArgs>(autoRecognizeManager_JobCompleted); 
 
         int count = imageFileNames.Length; 
         _documentsPending = count; 
         _allDocumentsFinishedEvent = new AutoResetEvent(false); 
 
         for (int i = 0; i < count; i++) 
         { 
            // Create the job data 
            string imageFileName = imageFileNames[i]; 
            string name = "Document " + (i + 1).ToString(); 
            Console.WriteLine("Queuing {0} file {1}", name, imageFileName); 
 
            JobData data = new JobData(); 
            data.AutoRecognizeManager = autoRecognizeManager; 
            data.ImageFileName = imageFileName; 
            data.DocumentFileName = Path.Combine(documentsDirectory, Path.GetFileNameWithoutExtension(imageFileName) + ".pdf"); 
            data.JobName = name; 
 
            // Queue this job 
            ThreadPool.QueueUserWorkItem(new WaitCallback(RunJob), data); 
         } 
 
         // Wait for all documents to finish 
         _allDocumentsFinishedEvent.WaitOne(); 
         _allDocumentsFinishedEvent.Close(); 
 
         autoRecognizeManager.JobStarted -= new EventHandler<OcrAutoRecognizeRunJobEventArgs>(autoRecognizeManager_JobStarted); 
         autoRecognizeManager.JobCompleted -= new EventHandler<OcrAutoRecognizeRunJobEventArgs>(autoRecognizeManager_JobCompleted); 
 
         Console.WriteLine("All documents finished, check the result files in {0}", documentsDirectory); 
      } 
   } 
   private void autoRecognizeManager_JobStarted(object sender, OcrAutoRecognizeRunJobEventArgs e) 
   { 
      // This is not strictly needed in this example, we will 
      // use it to show information 
      Console.WriteLine("{0} started...", e.Job.JobData.JobName); 
 
      // Check if we need to abort 
      if (AbortJobs(e.Job)) 
      { 
         // Yes, abort all jobs 
         e.Job.AutoRecognizeManager.AbortAllJobs(); 
      } 
   } 
 
   private void autoRecognizeManager_JobCompleted(object sender, OcrAutoRecognizeRunJobEventArgs e) 
   { 
      string message = string.Format("{0} completed ", e.Job.JobData.JobName); 
 
      IOcrAutoRecognizeJob job = e.Job; 
 
      // Show any errors 
      if (job.Errors.Count == 0) 
      { 
         message += "successfully..."; 
      } 
      else 
      { 
         message += "with errors, first error is " + job.Errors[0].Exception.Message; 
 
         // And save the errors to a text file in the document directory 
         string documentFileName = job.JobData.DocumentFileName; 
         string textPathName = Path.Combine(Path.GetDirectoryName(documentFileName), Path.GetFileNameWithoutExtension(documentFileName) + "_errors.txt"); 
         using (StreamWriter writer = File.CreateText(textPathName)) 
         { 
            writer.WriteLine(job.JobData.JobName); 
            writer.WriteLine("Data:"); 
            writer.WriteLine(" Image file name: " + job.JobData.ImageFileName); 
            writer.WriteLine(" First page number: " + job.JobData.FirstPageNumber); 
            writer.WriteLine(" Last page number: " + job.JobData.LastPageNumber); 
            writer.WriteLine(" Format:" + job.JobData.Format); 
            writer.WriteLine(" Document file name: " + job.JobData.DocumentFileName); 
            writer.WriteLine("Errors:"); 
 
            foreach (OcrAutoRecognizeManagerJobError error in job.Errors) 
            { 
               writer.WriteLine(" Page: {0} during {1}. Error: {2}", error.ImagePageNumber, error.Operation, error.Exception.Message); 
            } 
         } 
      } 
 
      Console.WriteLine(message); 
 
      // Decrement the documents count, when we reach 0, we are done 
      // Since this will be called from multiple threads, we need 
      // to use a thread-safety procedure 
      int pending = Interlocked.Decrement(ref _documentsPending); 
 
      // If we are the last document, wait up main thread 
      if (pending == 0) 
      { 
         _allDocumentsFinishedEvent.Set(); 
      } 
   } 
 
   private class JobData 
   { 
      public IOcrAutoRecognizeManager AutoRecognizeManager; 
      public string ImageFileName; 
      public string DocumentFileName; 
      public string JobName; 
   } 
 
   private void RunJob(object state) 
   { 
      JobData data = state as JobData; 
 
      Console.WriteLine("Running {0}", data.JobName); 
 
      // Run it 
      OcrAutoRecognizeJobData jobData = new OcrAutoRecognizeJobData(data.ImageFileName, DocumentFormat.Pdf, data.DocumentFileName); 
      jobData.JobName = data.JobName; 
      IOcrAutoRecognizeJob job = data.AutoRecognizeManager.CreateJob(jobData); 
      data.AutoRecognizeManager.RunJob(job); 
   } 
 
   private bool AbortJobs(IOcrAutoRecognizeJob ocrJob) 
   { 
      // In your application, you can check if abortion is required, for example, if the user 
      // has pressed the Cancel button on a progress bar or if your service is shutting down. 
 
      // In this example, we will never abort, but you can change this code to return true 
      // upon any condition (or when a specific job is about to start) 
      // and the engine will abort all current and pending jobs 
      return false; 
   } 
} 
 
static class LEAD_VARS 
{ 
   public const string ImagesDir = @"C:\LEADTOOLS23\Resources\Images"; 
   public const string OcrLEADRuntimeDir = @"C:\LEADTOOLS23\Bin\Common\OcrLEADRuntime"; 
}

 
import java.io.File; 
import java.io.FileNotFoundException; 
import java.io.FileWriter; 
import java.io.FilenameFilter; 
import java.io.IOException; 
import java.nio.file.Files; 
import java.nio.file.Path; 
import java.nio.file.Paths; 
import java.util.ArrayList; 
import java.util.concurrent.ExecutorService; 
import java.util.concurrent.Executors; 
import java.util.concurrent.atomic.AtomicInteger; 
 
import org.junit.*; 
import org.junit.runner.JUnitCore; 
import org.junit.runner.Result; 
import org.junit.runner.notification.Failure; 
 
import static org.junit.Assert.*; 
 
import leadtools.*; 
import leadtools.document.writer.*; 
import leadtools.internal.AutoResetEvent; 
import leadtools.ocr.*; 
 
 
// Number of documents that are pending 
private int _documentsPending; 
// Event to trigger when all documents are finished 
private AutoResetEvent _allDocumentsFinishedEvent; 
// Thread usage 
private final static AtomicInteger at = new AtomicInteger(); 
 
public void OcrAutoRecognizeManagerRunJobExample() throws IOException { 
   String LEAD_VARS_ImagesDir = "C:\\LEADTOOLS23\\Resources\\Images"; 
   String LEAD_VARS_OcrLEADRuntimeDir = "C:\\LEADTOOLS23\\Bin\\Common\\OcrLEADRuntime"; 
   String docsDir = combine(LEAD_VARS_ImagesDir, "RunJobExample"); 
   String imageDir = LEAD_VARS_ImagesDir; 
 
   // Create the output (documents) directory 
   Path docsPath = Paths.get(docsDir); 
   Files.createDirectories(docsPath); 
 
   // Get all TIF files in input (images) directory 
   Path imagePath = Paths.get(imageDir); 
   Files.createDirectories(imagePath); 
 
   FilenameFilter tifFileFilter = (d, s) -> { 
      return s.toLowerCase().endsWith(".tif"); 
   }; 
 
   File imageFolder = new File(imageDir); 
   String[] imageFileNames = imageFolder.list(tifFileFilter); 
   if (imageFileNames.length == 0) { 
      System.out.println("No images to OCR"); 
      return; 
   } 
 
   // Create a new OCR engine instance 
   OcrEngineType engineType = OcrEngineType.LEAD; 
   System.out.println("Starting up " + engineType + " engine"); 
   OcrEngine ocrEngine = OcrEngineManager.createEngine(engineType); 
 
   ocrEngine.startup(null, null, null, LEAD_VARS_OcrLEADRuntimeDir); 
 
   // Setup document PDF save options: Image/Text with CCITT G4 encoding for B/W 
   DocumentWriter docWriter = ocrEngine.getDocumentWriterInstance(); 
   PdfDocumentOptions pdfOptions = (PdfDocumentOptions) docWriter.getOptions(DocumentFormat.PDF); 
   pdfOptions.setImageOverText(true); 
   pdfOptions.setDocumentType(PdfDocumentType.PDF); 
   pdfOptions.setFontEmbedMode(DocumentFontEmbedMode.NONE); 
   pdfOptions.setOneBitImageCompression(OneBitImageCompressionType.FAX_G4); 
   docWriter.setOptions(DocumentFormat.PDF, pdfOptions); 
 
   // We are going to use multiple-threads, so disable threading in IOcrAutoRecognizeManager 
   OcrAutoRecognizeManager autoRecognizeManager = ocrEngine.getAutoRecognizeManager(); 
   autoRecognizeManager.setMaximumThreadsPerJob(1); 
 
   // Tell the recognize manager to continue on errors 
   autoRecognizeManager.setJobErrorMode(OcrAutoRecognizeManagerJobErrorMode.CONTINUE); 
 
   // Instead of using events to trigger when documents are done, 
   // we will use the JobCompleted events of IOcrAutoRecognizeManager 
   // to decrement a counter and trigger one event when the counter reaches 0 
   autoRecognizeManager.addJobStartedListener(autoRecognizeManager_JobStarted); 
   autoRecognizeManager.addJobCompletedListener(autoRecognizeManager_JobCompleted); 
   int count = imageFileNames.length; 
   _documentsPending = count; 
   at.set(_documentsPending); 
   _allDocumentsFinishedEvent = new AutoResetEvent(); 
 
   ExecutorService executorService = Executors.newFixedThreadPool(1); 
   System.out.println("Starting the threads and waiting..."); 
 
   for (int i = 0; i < count; i++) { 
      // Create the job data 
      String imageFileName = imageFileNames[i]; 
      String name = "Document " + (i + 1); 
      System.out.println("Queuing " + name + " file " + imageFileName); 
 
      JobData data = new JobData(); 
      data.AutoRecognizeManager = autoRecognizeManager; 
      data.ImageFileName = combine(LEAD_VARS_ImagesDir, imageFileName); 
      data.DocumentFileName = combine(docsDir,imageFileName.substring(0, imageFileName.indexOf(".")) + ".pdf"); 
      data.JobName = name; 
      File dataFile = new File(combine(docsDir,imageFileName.substring(0, imageFileName.indexOf(".")) + ".pdf")); 
      if (!dataFile.exists()) dataFile.createNewFile(); 
 
      Runnable runnableTask = new Runnable(){ 
 
         @Override 
         public void run(){ 
            RunJob(data); 
         } 
 
      }; 
  
      executorService.submit(runnableTask); 
   } 
 
   // Wait for all documents to finish 
   _allDocumentsFinishedEvent.waitOne(); 
   _allDocumentsFinishedEvent.close(); 
 
   System.out.println("All documents finished, check the result files in " + docsDir); 
   ocrEngine.dispose(); 
} 
 
OcrAutoRecognizeRunJobListener autoRecognizeManager_JobStarted = new OcrAutoRecognizeRunJobListener(){ 
 
   @Override public void onJob(OcrAutoRecognizeRunJobEvent e) { 
      // This is not strictly needed in this example, we will 
      // use it to show information 
      System.out.println(e.getJob().getJobData().getJobName()+" started..."); 
 
      // Check if we need to abort 
      if(AbortJobs(e.getJob())){ 
      // Yes, abort all jobs 
      e.getJob().getAutoRecognizeManager().abortAllJobs();} 
   } 
 
}; 
 
OcrAutoRecognizeRunJobListener autoRecognizeManager_JobCompleted = new OcrAutoRecognizeRunJobListener() { 
 
   @Override 
   public void onJob(OcrAutoRecognizeRunJobEvent e) { 
      OcrAutoRecognizeJob job = e.getJob(); 
      String message = job.getJobData().getJobName() + " completed "; 
 
      // Show any errors 
      if (job.getErrors().size()== 0) { 
         message += "successfully..."; 
      } 
      else { 
         message += " with errors, first error is " + job.getErrors().get(0).getException().getMessage(); 
 
         // And save the errors to a text file in the document directory 
         String documentFileName = job.getJobData().getDocumentFileName(); 
         File doc = new File(documentFileName); 
         String textPathName = combine(doc.getParent(), documentFileName.substring(0,documentFileName.indexOf(".")) + "_errors.txt"); 
 
         try (FileWriter writer = new FileWriter(textPathName)) { 
            writer.write(job.getJobData().getJobName()); 
            writer.write("Data:"+"\n"); 
            writer.write(" Image file name: " + job.getJobData().getImageFileName()+"\n"); 
            writer.write(" First page number: " + job.getJobData().getFirstPageNumber()+"\n"); 
            writer.write(" Last page number: " + job.getJobData().getLastPageNumber()+"\n"); 
            writer.write(" Format:" + job.getJobData().getFormat()+"\n"); 
            writer.write(" Document file name: " + job.getJobData().getDocumentFileName()+"\n"); 
            writer.write("Errors:"+"\n"); 
                
            for (OcrAutoRecognizeManagerJobError error : job.getErrors()) 
            { 
               writer.write(" Page: "+ error.getImagePageNumber() + " during " + error.getOperation() + ". Error: " + error.getException().getMessage() + "\n"); 
            } 
            writer.close(); 
         } catch (IOException e1) { 
            e1.printStackTrace(); 
         } 
      } 
 
      System.out.println(message); 
 
      // Decrement the documents count, when we reach 0, we are done 
      // Since this will be called from multiple threads, we need 
      // to use a thread-safety procedure 
      int pending = at.decrementAndGet(); 
      System.out.println(pending); 
          
      // If we are the last document, wait up main thread 
      if (pending == 0) 
         _allDocumentsFinishedEvent.set(); 
   } 
 
}; 
 
class JobData { 
   public OcrAutoRecognizeManager AutoRecognizeManager; 
   public String ImageFileName; 
   public String DocumentFileName; 
   public String JobName; 
} 
 
private void RunJob(JobData state) { 
   JobData data = state; 
   System.out.println("Running " + data.JobName); 
 
   // Run it 
   OcrAutoRecognizeJobData jobData = new OcrAutoRecognizeJobData( 
      data.ImageFileName,  
      DocumentFormat.PDF, 
      data.DocumentFileName 
   ); 
   jobData.setJobName(data.JobName); 
   OcrAutoRecognizeJob job = data.AutoRecognizeManager.createJob(jobData); 
   data.AutoRecognizeManager.runJob(job); 
} 
 
private boolean AbortJobs(OcrAutoRecognizeJob ocrJob) { 
   // In your application, you can check if abortion is required, for example, if the user 
   // has pressed the Cancel button on a progress bar or if your service is shutting down. 
 
   // In this example, we will never abort, but you can change this code to return true 
   // upon any condition (or when a specific job is about to start) 
   // and the engine will abort all current and pending jobs 
   return false; 
} 
 
public String combine(String path1, String path2) { 
   File file = new File(path1, path2); 
   return file.getPath(); 
}

Requirements

Target Platforms

Reference

IOcrAutoRecognizeManager Interface

IOcrAutoRecognizeManager Members

Programming with the LEADTOOLS .NET OCR

Multi-Threading with LEADTOOLS OCR

Download our FREE evaluation

Help Version 23.0.2024.4.19

Leadtools.Ocr Assembly

Introduction

Getting Started

Namespaces

Leadtools.Ocr Namespace

Assemblies