CreateDocument(string,OcrCreateDocumentOptions) Method

Summary

Creates a new OCR file or memory-based document object.

Syntax

Objective-C

C++/CLI

Java

Python

public IOcrDocument CreateDocument( 
   string documentFileName, 
   OcrCreateDocumentOptions options 
)

- (nullable LTOcrDocument *)createDocument:(nullable NSString *)ocrDocumentFilePath options:(LTOcrCreateDocumentOptions)options error:(NSError **)error NS_SWIFT_NAME(createDocument(filePath:options:));

public OcrDocument createDocument(String ocrDocumentFilePath, 
                                  int options)

IOcrDocument^ CreateDocument(  
   String^ documentFileName, 
   OcrCreateDocumentOptions options 
)

def CreateDocument(self,options):

Parameters

documentFileName
The document file name. This value can be null.

options
Options to control how the document is created or loaded.

Return Value

An object implementing IOcrDocument that can participate in recognition and saving operations.

Remarks

This method can either create a file or memory-based OCR document, or load a previously created file-based document based on the values of documentFileName and options as follows:

To create a memory-based document, pass OcrCreateDocumentOptions.InMemory to options. documentFileName is not used and the engine will not use a disk file to store the document data.

To create a file-based document that will be not be re-used, pass null to documentFileName and OcrCreateDocumentOptions.AutoDeleteFile to options. In this case, the engine will create a temporary file on disk to use as the store for the document file. The file is deleted when the IOcrDocument is disposed. Note that you use your own file name in documentFileName along with OcrCreateDocumentOptions.AutoDeleteFile, the engine will overwrite this file if it exists and automatically deletes it when disposed.

To create a file-based document that will be re-used, pass a file name to documentFileName and OcrCreateDocumentOptions.None to options. In this case, the engine will overwrite this file if it exists but will not delete it when IOcrDocument is disposed.

To re-load a document that was created with the previous option, pass the same file name to documentFileName and OcrCreateDocumentOptions.LoadExisting to options. In this case, the engine will re-generate the document from data found in the file.

Use IOcrDocument.IsInMemory to test whether a document is memory or file-based and IOcrDocument.FileName to get the name of the disk-file used by a file-based document. This will be set to the same value passed to documentFileName or the name of the temp file created.

For more information on memory and file-based documents, refer to Programming with the LEADTOOLS .NET OCR.

Typical OCR operation using the IOcrEngine involves starting up and then creating an OCR document using the CreateDocument method then adding the pages into it and perform either automatic or manual zoning. Once this is done, IOcrPage.Recognize is called on each page to collect the recognition data and have it stored internally in the page. After the recognition data is collected, you use the various IOcrDocument.Save or IOcrDocument.SaveXml methods to save the document to its final format.

When you are done using the IOcrDocument object created by this method, you should dispose it as soon as possible to free its resources. Disposing an IOcrDocument object will free all the pages stored inside its IOcrDocument.Pages collection.

Example

Java

using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Ocr; 
using Leadtools.Document.Writer; 
 
public void StartupEngineExample() 
{ 
   // Use RasterCodecs to load an image file 
   // Note: You can let the engine load the image file directly as shown in the other examples 
   RasterCodecs codecs = new RasterCodecs(); 
   RasterImage image = codecs.Load(Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif")); 
 
   // Assume you copied the engine runtime files to C:\MyApp\Ocr 
   string engineDir = @"C:\MyApp\Ocr"; 
 
   // Store the engine work directory into a path inside our application 
   string workDir = @"C:\MyApp\OcrTemp"; 
 
   // Delete all files in the work directory in case the previous version of our application exited abnormally and 
   // the engine did not get the chance to clean all of its temporary files (if any) 
   Directory.Delete(workDir, true); 
 
   // Re-create the work directory 
   Directory.CreateDirectory(workDir); 
 
   // Create an instance of the engine 
   using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD)) 
   { 
      // Show that the engine has not been started yet 
      Console.WriteLine("Before calling Startup, IsStarted = " + ocrEngine.IsStarted); 
 
      // Start the engine using our parameters 
      // Since we already have a RasterCodecs object, we can re-use it to save memory and resources 
      ocrEngine.Startup(codecs, null, workDir, engineDir); 
 
      // Make sure the engine is using our working directory 
      Console.WriteLine("workDir passed is {0}, the value of WorkDirectory after Startup is {1}", workDir, ocrEngine.WorkDirectory); 
 
      // Show that the engine has started fine 
      Console.WriteLine("After calling Startup, EngineType is {0}, IsStarted = {1}", ocrEngine.EngineType, ocrEngine.IsStarted); 
 
      // Maks sure the engine is using our own version of RasterCodecs 
      Debug.Assert(codecs == ocrEngine.RasterCodecsInstance); 
 
      // Create a page from the raster image as page to the document 
      IOcrPage ocrPage = ocrEngine.CreatePage(image, OcrImageSharingMode.AutoDispose); 
      // image belongs to the page and will be dispose when the page is disposed 
 
      // Recognize the page 
      // Note, Recognize can be called without calling AutoZone or manually adding zones. The engine will 
      // check and automatically auto-zones the page 
      ocrPage.Recognize(null); 
 
      // Create a file based document 
      using (IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(null, OcrCreateDocumentOptions.AutoDeleteFile)) 
      { 
         // Add the page 
         ocrDocument.Pages.Add(ocrPage); 
         // No need for the page anymore 
         ocrPage.Dispose(); 
 
         // Save the document we have as PDF 
         string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.pdf"); 
         ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, null); 
      } 
 
      // Shutdown the engine 
      // Note: calling Dispose will also automatically shutdown the engine if it has been started 
      ocrEngine.Shutdown(); 
   } 
} 
 
static class LEAD_VARS 
{ 
   public const string ImagesDir = @"C:\LEADTOOLS23\Resources\Images"; 
}

 
import java.io.File; 
import java.io.IOException; 
import java.nio.file.Files; 
import java.nio.file.Path; 
import java.nio.file.Paths; 
 
import org.junit.*; 
import org.junit.runner.JUnitCore; 
import org.junit.runner.Result; 
import org.junit.runner.notification.Failure; 
import static org.junit.Assert.assertTrue; 
 
import leadtools.*; 
import leadtools.codecs.*; 
import leadtools.document.writer.*; 
import leadtools.ocr.*; 
 
 
public void IOcrStartupEngineExample() throws IOException { 
   final String LEAD_VARS_IMAGES_DIR = "C:\\LEADTOOLS23\\Resources\\Images"; 
   final String OCR_LEAD_RUNTIME_DIR = "C:\\LEADTOOLS23\\Bin\\Common\\OcrLEADRuntime"; 
 
   // Use RasterCodecs to load an image file 
   // Note: You can let the engine load the image file directly as shown in the 
   // other examples 
   RasterCodecs codecs = new RasterCodecs(); 
   RasterImage image = codecs.load(combine(LEAD_VARS_IMAGES_DIR, "Ocr1.tif")); 
 
   // Store the engine work directory into a path inside our application 
   String workDir = "C:\\LEADTOOLS23\\Bin\\Common"; 
 
   // Delete all files in the work directory in case the previous version of our 
   // application exited abnormally and 
   // the engine did not get the chance to clean all of its temporary files (if 
   // any) 
   File directory = new File(workDir); 
   directory.delete(); 
 
   // Re-create the work directory 
   directory.createNewFile(); 
 
   // Create an instance of the engine 
   OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD); 
 
   // Show that the engine has not been started yet 
   System.out.println("Before calling Startup, IsStarted = " + ocrEngine.isStarted()); 
 
   // Start the engine using our parameters 
   // Since we already have a RasterCodecs object, we can re-use it to save memory 
   // and resources 
   ocrEngine.startup(codecs, null, workDir, OCR_LEAD_RUNTIME_DIR); 
   assertTrue(ocrEngine.isStarted()); 
 
   // Make sure the engine is using our working directory 
   System.out.printf("workDir passed is %s, the value of WorkDirectory after Startup is %s%n", workDir, 
         ocrEngine.getWorkDirectory()); 
   assertTrue("Work directory was not changed", workDir.equals(ocrEngine.getWorkDirectory())); 
 
   // Show that the engine has started fine 
   System.out.printf("After calling Startup, EngineType is %s, IsStarted = %s%n", ocrEngine.getEngineType(), 
         ocrEngine.isStarted()); 
 
   // Maks sure the engine is using our own version of RasterCodecs 
   assertTrue("Engine is using incorrect RasterCodecs", codecs == ocrEngine.getRasterCodecsInstance()); 
 
   // Create a page from the raster image as page to the document 
   OcrPage ocrPage = ocrEngine.createPage(image, OcrImageSharingMode.AUTO_DISPOSE); 
   // image belongs to the page and will be dispose when the page is disposed 
 
   // Recognize the page 
   // Note, Recognize can be called without calling AutoZone or manually adding 
   // zones. The engine will 
   // check and automatically auto-zones the page 
   ocrPage.recognize(null); 
 
   // Create a file based document 
   OcrDocument ocrDocument = ocrEngine.getDocumentManager().createDocument(null, 
         OcrCreateDocumentOptions.AUTO_DELETE_FILE.getValue()); 
 
   // Add the page 
   ocrDocument.getPages().add(ocrPage); 
   // No need for the page anymore 
   ocrPage.dispose(); 
 
   // Save the document we have as PDF 
   String pdfFileName = combine(LEAD_VARS_IMAGES_DIR, "Ocr1.pdf"); 
   ocrDocument.save(pdfFileName, DocumentFormat.PDF, null); 
 
   // Shutdown the engine 
   // Note: calling Dispose will also automatically shutdown the engine if it has 
   // been started 
   ocrEngine.shutdown(); 
}