After starting the OCR engine, you can begin working with the document page(s). Pages can work with or without An OCR document.
The LEADTOOLS OCR methods provide support for the following when working with OCR pages:
As described in Programming with Leadtools .NET OCR, an IOcrPage page can be created directly using CreatePage without using an IOcrDocument object. These pages can be zoned and recognized and the OCR results can be obtained directly using GetText or GetRecognizedCharacters.
If the page are to be saved to a final document such as PDF or DOCX, then an IOcrDocument instance is required.
An instance of IOcrDocument contains the pages of a document. You can create a new OCR document using the IOcrDocumentManager.CreateDocument method. This method allows creation of memory-based or file-based documents.
Each OCR document can have one or more pages (IOcrPage objects). Each IOcrDocument contains an IOcrDocument.Pages property of type IOcrPageCollection that you can use to access the pages of a document.
IOcrDocument through the IOcrDocument.Pages property holds a collection of IOcrPage objects. Each of these IOcrPage objects contains the raster image used to create it (the image used when the page is loaded or added) and a group of OCR zones for the page either added manually or through auto-zoning.
The IOcrPageCollection interface implements standard .NET Collection{T}, IList{T}, and IEnumerable{T} interfaces and hence, you can use the member of these interfaces to add, remove, get, set and iterate through the different pages of the document if the document was memory-based.
For file-based document, adding a page to the document involves taking a snap shot of the current recognition data and store it internally. The page itself is not added to the collection and is not required to stay in memory. The collection is a read-only view on the document and the user can only add new pages and not remove or iterate through them
The following list contains the major functionality of the IOcrPageCollection interface:
Add new pages to the document from raster image files. These files can be in disk files, a .NET stream (memory or otherwise) or even in a remote URL. The following table lists all the page addition method groups:
Methods | Description |
---|---|
IOcrPageCollection.Add | Adds the recognition data of a IOcrPage to a document. File-based documents only. |
IOcrPageCollection.AddPage | Adds a single page from a RasterImage, DIB or an image file in disk file, .NET stream, remote URL. Memory-based documents only. |
IOcrPageCollection.AddPages | Adds multiple pages from a multipage RasterImage or an image file in disk file, .NET stream or remote URL. Memory-based documents only. |
IOcrPageCollection.InsertPage | Inserts into a specific location a single page from a RasterImage, DIB or an image file in disk file, .NET stream, remote URL. Memory-based documents only. |
IOcrPageCollection.InsertPages | Inserts into a specific location multiple pages from a multipage RasterImage or an image file in disk file, .NET stream or remote URL. Memory-based documents only. |
Export pages from the OCR document to raster image files. You can save the pages in disk files, .NET streams or as a single or multipage RasterImage object with any of the file formats supported by LEADTOOLS. Exporting a page is supported by memory-based documents only. The following table list all the page exporting method groups:
Methods | Description |
---|---|
IOcrPageCollection.ExportPage | Saves a single page in the document to a RasterImage object, an image file in disk file or a .NET stream. |
IOcrPageCollection.ExportPages | Saves multiple pages in the document to a multipage RasterImage object, an image file in disk file or a .NET stream. |
Perform auto image preprocessing on a single or multiple pages in the document through IOcrPageCollection.AutoPreprocess. These methods provide a shortcut for iterating through the pages in the collection and calling IOcrPage.AutoPreprocess on each page. This is supported by memory-mode document only.
Getting Started (Guide to Example Programs)
Programming with LEADTOOLS .NET OCR
An Overview of OCR Recognition Modules
Creating an OCR Engine Instance
Starting and Shutting Down the OCR Engine
Multi-Threading with LEADTOOLS OCR
OCR Spell Language Dictionaries
Using OMR in LEADTOOLS .NET OCR
OCR Languages and Spell Checking
OCR Tutorial - Adding and Painting Zones
OCR Tutorial - Working with Pages
OCR Tutorial - Recognizing Pages
OCR Tutorial - Working with Recognition Results