IOcrPageCollection Interface

Summary

Represents the pages of an OCR document object.

Syntax

Objective-C

C++/CLI

Java

Python

[DefaultMemberAttribute("Item")] 
public interface IOcrPageCollection : ICollection<IOcrPage>, IEnumerable<IOcrPage>, IEnumerable, IList<IOcrPage>

@interface LTOcrPageCollection : NSObject<NSFastEnumeration>

public class OcrPageCollection implements List<OcrPage>

[DefaultMemberAttribute("Item")] 
public interface class IOcrPageCollection : public System.Collections.Generic.ICollection<IOcrPage>, System.Collections.Generic.IEnumerable<IOcrPage>, System.Collections.Generic.IList<IOcrPage>, System.Collections.IEnumerable

class IOcrPageCollection(ICollection):

Remarks

IOcrPageCollection holds the pages currently added into an OCR document (IOcrDocument). IOcrDocument through the IOcrDocument.Pages holds a collection of IOcrPage object. Each of these IOcrPage objects contains the raster image used to create it (the image used when the page is loaded or added) and a group of OCR zones for the page either added manually or through auto-zoning.

In memory-based IOcrDocument, the IOcrPageCollection holds the pages. The user can recognize any or all of the pages at any time and pages can be added or removed at will.

In file-based IOcrDocument, the IOcrPageCollection is a store-only view of the pages. when page is added, a snap shot of the current recognition data is saved into the document. This data cannot be modified anymore and the page is no longer needed. The user must recognize the pages before they are added to the document and pages can only be added but not removed. In this mode, you can only use IOcrPageCollection.Add and IOcrPageCollection.Count. No other method or property is supported.

The IOcrPageCollection interface implements standard .NET ICollection<T>, IList<T>, and IEnumerable<T> interfaces and hence, you can use the member of these interfaces to add, remove, get, set and iterate through the different pages of the OCR document (if the document is memory-based).

Memory-Based Documents

The following list contains the major functionality of the IOcrPageCollection interface of a memory-based document:

Add new pages to an OCR document from raster image files. These files can be in disk files, a .NET stream (memory or otherwise) or even in a remote URL. The following table lists all the page addition method groups:

Methods	Description
AddPage	Adds a single page from a RasterImage, DIB or an image file in disk file, .NET stream, remote URL.
AddPages	Adds multiple pages from a multipage RasterImage or an image file in disk file, .NET stream or remote URL.
InsertPage	Inserts into a specific location a single page from a RasterImage, DIB or an image file in disk file, .NET stream, remote URL.
InsertPages	Inserts into a specific location multiple pages from a multipage RasterImage or an image file in disk file, .NET stream or remote URL.

Export pages from the OCR document to raster image files. You can save the pages in disk files, .NET streams or as a single or multipage RasterImage object with any of the file formats supported by LEADTOOLS. The following table list all the page exporting method groups:

Methods	Description
ExportPage	Saves a single page from the OCR document to a RasterImage object, an image file in disk file or a .NET stream.
ExportPages	Saves multiple pages from the OCR document to a multipage RasterImage object, an image file in disk file or a .NET stream.

Perform auto image preprocessing on a single or multiple pages in the OCR document through AutoPreprocess. These methods provide a shortcut for iterating through the pages in the collection and calling IOcrPage.AutoPreprocess on each page.
Perform auto-zoning on a single or multiple pages in the OCR document through AutoZone. These methods provide a shortcut for iterating through the pages in the collection and calling IOcrPage.AutoZone on each page.
Recognize a single or multiple pages in the OCR document through Recognize. These methods provide a shortcut for iterating through the pages in the collection and calling IOcrPage.Recognize on each page.

File-Based Documents

Only the following members are supported in file-based documents:

Add: Adds an IOcrPage to the document by taking a snapshot of its correct recognition data.
Count: Gets the number of pages in the document.

The LEADTOOLS OCR engine supports pages of dots per inch (DPI) values of 150 and greater. If you try to add a page with a DPI of less than 150 then the engine might be able to recognize any data from this page.

Example

This example will load multiple-pages into an OCR document and saves the OCR result into a multiple-page PDF file.

Java

using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Ocr; 
using Leadtools.Document.Writer; 
using Leadtools.Forms.Common; 
using Leadtools.ImageProcessing.Core; 
 
public void PageCollectionExamples() 
{ 
   // For this example, we need a multi-page TIF file. 
   // Create a muti-page TIF from Ocr1.tif, Ocr2.tif, Ocr3.tif and Ocr4.tif 
   string tifFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr.tif"); 
   if (File.Exists(tifFileName)) 
      File.Delete(tifFileName); 
 
   using (RasterCodecs codecs = new RasterCodecs()) 
   { 
      for (int i = 0; i < 4; i++) 
      { 
         string pageFileName = Path.Combine(LEAD_VARS.ImagesDir, string.Format("Ocr{0}.tif", i + 1)); 
         using (RasterImage image = codecs.Load(pageFileName)) 
            codecs.Save(image, tifFileName, RasterImageFormat.CcittGroup4, 1, 1, 1, -1, CodecsSavePageMode.Append); 
      } 
   } 
 
   string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr.pdf"); 
 
   // Create an instance of the engine 
   using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD)) 
   { 
      // Start the engine using default parameters 
      ocrEngine.Startup(null, null, null, LEAD_VARS.OcrLEADRuntimeDir); 
 
      // Create an OCR document 
      using (IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()) 
      { 
         // Load all the pages of the multi-page tif file we created into the form 
         ocrDocument.Pages.AddPages(tifFileName, 1, -1, null); 
         Console.WriteLine("{0} pages added to the document", ocrDocument.Pages.Count); 
 
         // Auto-zone 
         ocrDocument.Pages.AutoZone(null); 
 
         // Recognize 
         ocrDocument.Pages.Recognize(null); 
 
         // Save 
         ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, null); 
      } 
 
      // Shutdown the engine 
      // Note: calling Dispose will also automatically shutdown the engine if it has been started 
      ocrEngine.Shutdown(); 
   } 
} 
 
static class LEAD_VARS 
{ 
   public const string ImagesDir = @"C:\LEADTOOLS23\Resources\Images"; 
   public const string OcrLEADRuntimeDir = @"C:\LEADTOOLS23\Bin\Common\OcrLEADRuntime"; 
}

 
import java.io.File; 
import java.io.IOException; 
import java.net.URI; 
import java.net.URISyntaxException; 
 
import org.junit.*; 
import org.junit.runner.JUnitCore; 
import org.junit.runner.Result; 
import org.junit.runner.notification.Failure; 
import static org.junit.Assert.*; 
 
import leadtools.*; 
import leadtools.codecs.*; 
import leadtools.document.writer.DocumentFormat; 
import leadtools.imageprocessing.core.DeskewCommand; 
import leadtools.imageprocessing.core.DeskewCommandFlags; 
import leadtools.ocr.OcrDocument; 
import leadtools.ocr.OcrEngine; 
import leadtools.ocr.OcrEngineManager; 
import leadtools.ocr.OcrEngineType; 
 
 
public void IOcrPageCollectionsPageCollectionExamples() { 
   final String LEAD_VARS_IMAGES_DIR = "C:\\LEADTOOLS23\\Resources\\Images"; 
   final String LEAD_VARS_OCR_LEAD_RUNTIME_DIR = "C:\\LEADTOOLS23\\Bin\\Common\\OcrLEADRuntime"; 
 
   // For this example, we need a multi-page TIF file. 
   // Create a muti-page TIF from Ocr1.tif, Ocr2.tif, Ocr3.tif and Ocr4.tif 
   var tifFileName = combine(LEAD_VARS_IMAGES_DIR, "Ocr.tif"); 
   var file = new File(tifFileName); 
   if (file.exists()) 
      file.delete(); 
 
   RasterCodecs codecs = new RasterCodecs(); 
   for (var i = 0; i < 4; i++) { 
      var pageFileName = combine(LEAD_VARS_IMAGES_DIR, "Ocr" + (i + 1) + ".tif"); 
      var image = codecs.load(pageFileName); 
      codecs.save(image, tifFileName, RasterImageFormat.CCITT_GROUP4, 1, 1, 1, -1, CodecsSavePageMode.APPEND); 
   } 
   codecs = null; 
 
   var pdfFileName = combine(LEAD_VARS_IMAGES_DIR, "Ocr.pdf"); 
 
   codecs = new RasterCodecs(); 
   // Create an instance of the engine 
   OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD); 
 
   // Start the engine using default parameters 
   ocrEngine.startup(codecs, null, null, LEAD_VARS_OCR_LEAD_RUNTIME_DIR); 
   assertTrue("Engine unsuccessfully started", ocrEngine.isStarted()); 
 
   // Create an OCR document 
   OcrDocument ocrDocument = ocrEngine.getDocumentManager().createDocument(); 
 
   // Load all the pages of the multi-page tif file we created into the form 
   ocrDocument.getPages().addPages(codecs.load(tifFileName), 1, -1, null); 
   System.out.println(ocrDocument.getPages().size() + " pages added to the document"); 
 
   // Auto-zone 
   ocrDocument.getPages().autoZone(null); 
 
   // Recognize 
   ocrDocument.getPages().recognize(null); 
 
   // Save 
   ocrDocument.save(pdfFileName, DocumentFormat.PDF, null); 
   assertTrue("File unsuccessfully saved", (new File(tifFileName)).exists()); 
 
   // Shutdown the engine 
   // Note: calling Dispose will also automatically shutdown the engine if it has 
   // been started 
   ocrEngine.shutdown(); 
}