←Select platform

Run(string,Stream,DocumentFormat,OcrProgressCallback) Method

Summary

Converts an image file on disk to a document file in the specified document format in an output stream.

Syntax
C#
C++/CLI
Python

Parameters

imageFileName

The file containing the input image.

documentStream

The stream that will contain the resulting document file.

format

The output document format. If this parameter is DocumentFormat.User, then the document is saved using the native engine format set in IOcrDocumentManager.EngineFormat if the engine used supports native formats; otherwise, an exception will be thrown.

callback

Optional callback to show operation progress.

This method will perform the following operations:

  1. Trigger the JobStarted event.

  2. Create one or more IOcrDocument objects to store the pages into. The number of OCR documents created depends on the value of MaximumThreadsPerJob. If this value is 0 (maximum CPUs/cores) or is greater than 1 and multiple threads are supported by this engine, then more than one document might be created to participate in the recognition process. The document will be created as a disk-based document.

  3. Loop through all the pages in _imageFile_ and for each page:

    Create the page using IOcrEngine.CreatePage.

    Auto-zoning of the page is performed instead with IOcrPage.AutoZone.

    Call IOcrPage.Recognize to get the OCR data of the page.

    For LEADTOOLS OCR Module - LEAD Engine, add the page to the document using IOcrDocument.Pages.Add.

    For engines other than LEADTOOLS OCR Module - LEAD Engine: If multiple documents are used or the current number of recognized pages is greater than the maximum specified in MaximumPagesBeforeLtd, then save the current recognition data to a temporary LTD file and clear the OCR document.

  4. After all pages are processed, they are saved to  documentStream using the format specified in  format. If LTD was used, the temporary file is converted to the final document using DocumentWriter.Convert and optionally DocumentWriter.AppendLtd.

  5. Delete all OCR documents and temporary files.

  6. Trigger the JobCompleted event.

  7. Use the JobProgress event or  callback to show the operation progress or to abort it if threading is not used. For more information and an example, refer to OcrProgressCallback.

  8. Use the JobOperation event to get information regarding the current operation being performed. For more information and an example, refer to JobOperation.

The IOcrAutoRecognizeManager interface also has the following options to use with this method:

Option Description
MaximumPagesBeforeLtd

Adds support for converting a document with an unlimited number of pages. An OCR recognition operation on a document that contains a large amount of pages (10 and more) might result in an out of memory error.

All of the LEADTOOLS OCR engines support saving the intermediate recognition results to a temporary LTD file (DocumentFormat.LTD). Subsequent pages will be appended to this temporary file. When all the pages of the document have been recognized, the engine will convert the temporary LTD file to the desired output format.

The MaximumPagesBeforeLtd property defines the maximum number of pages to be processed at a time. For example, if the original document has 20 pages and the value of this property is 8, the engine will recognize the first 8 pages and save the results to a temporary file, recognize the second 8 pages and append the results, and finally, recognize the last 4 pages and convert the temporary document into the final format.

PreprocessPageCommands

Holds an array of OcrAutoPreprocessPageCommand items to control which auto-preprocess operations to perform on each page document prior to recognition.

MaximumThreadsPerJob

Maximum number of threads to use per job. You can instruct IOcrAutoRecognizeManager to use all available machine CPUs/cores when recognizing a document. This will greatly reduce the time required to finish the OCR operation.

JobErrorMode

Ability to resume after non-critical errors. For example, if a source document has a page that could not be recognized, the offending page will be added to the final document as a graphics image and recognition will continue to the next page.

JobStarted, JobProgress, JobOperation and JobCompleted events

Events to track when both synchronous and asynchronous jobs have started, are being run and have completed.

AbortAllJobs

Aborts all running and pending jobs.

EnableTrace

Outputs debug messages to the standard .NET trace listeners.

Example
C#
Java
using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Ocr; 
using Leadtools.Document.Writer; 
using Leadtools.Forms.Common; 
using Leadtools.WinForms; 
 
public void OcrAutoRecognizeManagerRun4Example() 
{ 
   string tifFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif"); 
   string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.pdf"); 
 
   // Create an instance of the engine 
   using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD)) 
   { 
      // Start the engine using default parameters 
      Console.WriteLine("Starting up the engine..."); 
      ocrEngine.Startup(null, null, null, LEAD_VARS.OcrLEADRuntimeDir); 
 
      IOcrAutoRecognizeManager ocrAutoRecognizeManager = ocrEngine.AutoRecognizeManager; 
 
      using (Stream outputStream = new MemoryStream()) 
      { 
         // Recognize the document 
         ocrAutoRecognizeManager.Run(tifFileName, outputStream, DocumentFormat.Pdf, null); 
 
         // Save the result into the output document file 
         outputStream.Seek(0, SeekOrigin.Begin); 
         using (var fileStream = File.Create(pdfFileName)) 
            outputStream.CopyTo(fileStream); 
      } 
   } 
} 
 
static class LEAD_VARS 
{ 
   public const string ImagesDir = @"C:\LEADTOOLS23\Resources\Images"; 
   public const string OcrLEADRuntimeDir = @"C:\LEADTOOLS23\Bin\Common\OcrLEADRuntime"; 
} 
 
import java.io.File; 
import java.io.FileNotFoundException; 
import java.io.FileWriter; 
import java.io.FilenameFilter; 
import java.io.IOException; 
import java.nio.file.Files; 
import java.nio.file.Path; 
import java.nio.file.Paths; 
import java.util.ArrayList; 
import java.util.concurrent.ExecutorService; 
import java.util.concurrent.Executors; 
import java.util.concurrent.atomic.AtomicInteger; 
 
import org.junit.*; 
import org.junit.runner.JUnitCore; 
import org.junit.runner.Result; 
import org.junit.runner.notification.Failure; 
 
import static org.junit.Assert.*; 
 
import leadtools.*; 
import leadtools.document.writer.*; 
import leadtools.internal.AutoResetEvent; 
import leadtools.ocr.*; 
 
 
public void OcrAutoRecognizeManagerRun4Example() { 
   String LEAD_VARS_ImagesDir = "C:\\LEADTOOLS23\\Resources\\Images"; 
   String LEAD_VARS_OcrLEADRuntimeDir = "C:\\LEADTOOLS23\\Bin\\Common\\OcrLEADRuntime"; 
   String tifFileName = combine(LEAD_VARS_ImagesDir, "Ocr1.tif"); 
   String pdfFileName = combine(LEAD_VARS_ImagesDir, "Ocr1.pdf"); 
 
   // Create an instance of the engine 
   OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD); 
 
   // Start the engine using default parameters 
   System.out.println("Starting up the engine..."); 
   ocrEngine.startup(null, null, null, LEAD_VARS_OcrLEADRuntimeDir); 
   assertTrue("OCR Engine unsuccessfully started", ocrEngine.isStarted()); 
 
   OcrAutoRecognizeManager ocrAutoRecognizeManager = ocrEngine.getAutoRecognizeManager(); 
 
   ocrAutoRecognizeManager.run(tifFileName, pdfFileName, DocumentFormat.PDF, null); 
   ocrEngine.dispose(); 
} 
Requirements

Target Platforms

Help Version 23.0.2024.4.19
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2024 LEAD Technologies, Inc. All Rights Reserved.

Leadtools.Ocr Assembly
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.