←Select platform

Run(string,string,DocumentFormat,string,OcrProgressCallback) Method

Summary

Converts an image file on disk to a document file in the specified document format with optional multipage zone file.

Syntax
C#
VB
Objective-C
C++
Java
Overloads Sub Run( _ 
   ByVal imageFileName As String, _ 
   ByVal documentFileName As String, _ 
   ByVal format As DocumentFormat, _ 
   ByVal zonesFileName As String, _ 
   ByVal callback As OcrProgressCallback _ 
)  
- (BOOL)     run:(NSString *)imageFileName  
documentFileName:(NSString *)documentFileName  
   zonesFileName:(nullable NSString *)zonesFileName  
          format:(LTDocumentFormat)format  
           error:(NSError **)error 
public void run(String imageFileName, 
                String documentFileName, 
                DocumentFormat format, 
                String zonesFileName) 

Parameters

imageFileName
The name of the file containing the image.

documentFileName
The name of the result document file.

format
The output document format. If this parameter is DocumentFormat.User, then the document saved using the native engine format set in IOcrDocumentManager.EngineFormat if the engine used supports native formats, otherwise an exception will be thrown.

zonesFileName
Optional name of prepared multipage zone file. This parameter can be a null (Nothing in VB) reference.

callback
Optional callback to show operation progress.

Remarks

This method will perform the following operations:

  1. The JobStarted event is triggered.

  2. Creates one or more IOcrDocument object to store the pages into. The number of OCR documents created is dependent on MaximumThreadsPerJob. If this value is 0 (maximum CPUs/cores) or is greater than 1 and multiple threads is supported by this engine, then more than one document might be created to participate in the recognition process. The document will be created as disk-based.

  3. Loops through all the pages in imageFileName and for each page:

    The page is created using IOcrEngine.CreatePage.

    If zonesFileName contains a valid multipage zone file name and has an entry for the current page, then the zones are loaded with IOcrPage.LoadZones(fileName, pageNumber) and applied to the page. If zonesFileName is a null (Nothing in VB) reference or it does not contain an equivalent page number, auto-decomposing of the page is performed instead with IOcrPage.AutoZone.

    IOcrPage.Recognize is called to get the OCR data of the page.

    For LEADTOOLS OCR Module - LEAD Engine, the page is added to the document using IOcrDocument.Pages.Add.

    For engines other than LEADTOOLS OCR Module - LEAD Engine: If multiple documents are used or current number of recognized pages is greater than the maximum specified in MaximumPagesBeforeLtd, then current recognition data is saved to a temporary LTD file and the OCR document is cleared.

  4. When all pages are processed they are saved to result file name specified in documentFileName using the format specified in format. If LTD was used, the temporary file is converted to the final document using DocumentWriter.Convert and optionally DocumentWriter.AppendLtd.

  5. All OCR documents and temporary files are deleted.

  6. The JobCompleted event is triggered.

  7. You can use the JobProgress event or callback to show the operation progress or to abort it if threading is not used. For more information and an example, refer to OcrProgressCallback.

  8. You can use the JobOperation event to get information regarding the current operation being performed. For more information and an example, refer to JobOperation.

The IOcrAutoRecognizeManager interface also has the following options to use with this method:

Option Description
MaximumPagesBeforeLtd

Add support for converting a document with unlimited number of pages. An OCR recognition operation on a document that contains a large amount of pages (10 and more) might result in an out of memory error.

All of the LEADTOOLS OCR engines supports saving the intermediate recognition results to a temporary LTD file (DocumentFormat.LTD). The result of subsequent pages will be appended to this temporary file. When all the pages of the document have been recognized, the engine will convert the temporary LTD file to the desired output format.

The MaximumPagesBeforeLtd property defines the maximum number of pages processed as a whole. For example, if the original document has 20 pages and the value of this property is 8, the engine will recognize the first 8 pages and saves the result to a temporary file, recognizes the second 8 pages and append the results, and finally, recognize the last 4 pages and convert the temporary document into the final format.

PreprocessPageCommands

Holds an array of OcrAutoPreprocessPageCommand items to control what auto-preprocess operation to perform on each page document prior to recognition.

MaximumThreadsPerJob

Maximum number of threads to use per job. You can instruct IOcrAutoRecognizeManager to use all available machine CPUs/cores when recognizing a document. This will greatly reduce the time required to finish the OCR operation.

JobErrorMode

Ability to resume on none critical errors. For example, if a source document has a page that could not be recognized. The offending page will be added to the final document as a graphics images and recognition will continue to the next page.

JobStarted, JobProgress, JobOperation and JobCompleted events

Events to track when both synchronous and asynchronous jobs has started, being run and completed.

AbortAllJobs

Aborts all running and pending jobs.

EnableTrace

Output debug messages to the standard .NET trace listeners.

Example
C#
VB
using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Ocr; 
using Leadtools.Document.Writer; 
using Leadtools.Forms.Common; 
using Leadtools.WinForms; 
 
public void OcrAutoRecognizeManagerRun2Example() 
{ 
   string tifFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif"); 
   string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.pdf"); 
 
   // Create an instance of the engine 
   using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD, false)) 
   { 
      // Start the engine using default parameters 
      Console.WriteLine("Starting up the engine..."); 
      ocrEngine.Startup(null, null, null, LEAD_VARS.OcrLEADRuntimeDir); 
 
      IOcrAutoRecognizeManager ocrAutoRecognizeManager = ocrEngine.AutoRecognizeManager; 
 
      // Recognize the document 
      ocrAutoRecognizeManager.Run(tifFileName, pdfFileName, DocumentFormat.Pdf, null, null); 
   } 
} 
 
static class LEAD_VARS 
{ 
   public const string ImagesDir = @"C:\Users\Public\Documents\LEADTOOLS Images"; 
   public const string OcrLEADRuntimeDir = @"C:\LEADTOOLS 20\Bin\Common\OcrLEADRuntime"; 
} 
Imports Leadtools 
Imports Leadtools.Codecs 
Imports Leadtools.Ocr 
Imports Leadtools.Document.Writer 
Imports Leadtools.Forms.Common 
Imports Leadtools.WinForms 
 
Public Sub OcrAutoRecognizeManagerRun2Example() 
   Dim tifFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif") 
   Dim pdfFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.pdf") 
 
   ' Create an instance of the engine 
   Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD, False) 
      ' Start the engine using default parameters 
      Console.WriteLine("Starting up the engine...") 
      ocrEngine.Startup(Nothing, Nothing, Nothing, LEAD_VARS.OcrLEADRuntimeDir) 
 
      Dim ocrAutoRecognizeManager As IOcrAutoRecognizeManager = ocrEngine.AutoRecognizeManager 
 
      ' Recognize the document 
      ocrAutoRecognizeManager.Run(tifFileName, pdfFileName, DocumentFormat.Pdf, Nothing, Nothing) 
   End Using 
End Sub 
 
Public NotInheritable Class LEAD_VARS 
   Public Const ImagesDir As String = "C:\Users\Public\Documents\LEADTOOLS Images" 
   Public Const OcrLEADRuntimeDir As String = "C:\LEADTOOLS 20\Bin\Common\OcrLEADRuntime" 
End Class 

Requirements

Target Platforms

Help Version 20.0.2020.4.2
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2020 LEAD Technologies, Inc. All Rights Reserved.

Leadtools.Ocr Assembly