Programming with LEADTOOLS OCR Module - LEAD Engine

The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.

Once the LEADTOOLS C API OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Setting a Runtime License.

LEADTOOLS provides methods to:

Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats.
Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish.
Segment complex pages manually or automatically into text zones, image zones and table zones.
Recognize text from 5 to 72 points in virtually any typeface.
Increase recognition accuracy with user dictionaries.
Process both text and graphics. The recognition software's ability to distinguish halftone graphics from text can provide the basis of a compound document processing system.
Save the document in any of 40 formats, including Adobe PDF and PDF/A, MS Word, MS Excel as well as various flavors of ASCII and UNICODE text.

The following is an outline of the general steps involved in recognizing one or more pages.

Select the engine type you wish to use and create the OCR engine using L_OcrEngineManager_CreateEngine method.
Startup the OCR Engine with the L_OcrEngine_Startup method. For more information, refer to Starting and Shutting down the Engine.
Create an OCR page by loading a bitmap first and then pass the bitmap handle to the L_OcrPage_FromBitmap method.
Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer toWorking with OCR Zones.
Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
Recognize. For more information, refer to Recognizing OCR Pages.
Create OCR document (File or Memory based document) and add the recognized page into it in order to be saved later.
Save recognition results, if desired. For more information, refer to Recognizing OCR Pages.
Shut down the OCR engine when finished by calling L_OcrEngine_Shutdown or L_OcrEngine_Destroy.

Steps 4 and 5 can be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.

The following example shows how to perform the above steps in code:

BITMAPHANDLE bitmap = { 0 }; 
L_OcrEngine ocrEngine = NULL; 
L_OcrPage ocrPage = NULL; 
L_OcrDocumentManager ocrDocumentManager = NULL; 
L_OcrDocument ocrDocument = NULL; 
// Create an instance of the engine 
L_INT retCode = L_OcrEngineManager_CreateEngine(L_OcrEngineType_LEAD, &ocrEngine); 
if(retCode != SUCCESS) 
   return retCode; 
// Start the engine using default parameters 
retCode = L_OcrEngine_Startup(ocrEngine, NULL, L_TEXT("C:\\LEADTOOLS21\\Bin\\Common\\OcrLEADRuntime")); 
if(retCode != SUCCESS) 
   goto CLEANUP; 
// Load an image to process 
retCode = L_LoadBitmap(L_TEXT("C:\\LEADTOOLS21\\Resources\\Images\\Ocr1.tif"), &bitmap, sizeof(BITMAPHANDLE), 0, ORDER_RGB, NULL, NULL); 
if(retCode != SUCCESS) 
   goto CLEANUP; 
// Add an image to OCR page. Transfer ownership of the bitmap to the page 
retCode = L_OcrPage_FromBitmap(ocrEngine, &ocrPage, &bitmap, L_OcrBitmapSharingMode_AutoFree, MyOcrProgressCallback, NULL); 
if(retCode != SUCCESS) 
   goto CLEANUP; 
// We have a valid page and bitmap ownership has transferred. So, we do not need to free the bitmap anymore. 
// Bitmap will be freed when ocrPage is destroyed. 
bitmap.Flags.Allocated = 0; 
// Automatically find areas/zones on the page where text is located 
retCode = L_OcrPage_AutoZone(ocrPage, MyOcrProgressCallback, NULL); 
if(retCode != SUCCESS) 
   goto CLEANUP; 
// Recognize the page 
// Note: Recognize can be called without calling AutoZone or manually adding zones. 
// The engine will check and automatically auto-zone the page 
retCode = L_OcrPage_Recognize(ocrPage, MyOcrProgressCallback, NULL); 
if(retCode != SUCCESS) 
   goto CLEANUP; 
retCode = L_OcrEngine_GetDocumentManager(ocrEngine, &ocrDocumentManager); 
if(retCode != SUCCESS) 
   goto CLEANUP; 
// Create an OCR document 
retCode = L_OcrDocumentManager_CreateDocument(ocrDocumentManager, &ocrDocument, L_OcrCreateDocumentOptions_AutoDeleteFile, NULL); 
if(retCode != SUCCESS) 
   goto CLEANUP; 
// In Document File Mode, add OcrPage to OcrDocument after recognition 
retCode = L_OcrDocument_AddPage(ocrDocument, ocrPage); 
if(retCode != SUCCESS) 
   goto CLEANUP; 
// Adding the page to a file based document will take a snap shot of the recognition data and store it in the document. At this 
// point, the page is no longer needed. So destroy it to free up memory not used anymore 
L_OcrPage_Destroy(ocrPage); 
// Set the handle to NULL so we do not free it in our clean-up code 
ocrPage = NULL; 
// Save the document we have as PDF 
retCode = L_OcrDocument_Save(ocrDocument, L_TEXT("C:\\Users\\Public\\Documents\\LEADTOOLS Image\\Ocr1.pdf"), DOCUMENTFORMAT_PDF, NULL, NULL); 
CLEANUP: 
if(bitmap.Flags.Allocated) 
   L_FreeBitmap(&bitmap); 
if(ocrPage != NULL) 
   L_OcrPage_Destroy(ocrPage); 
if(ocrDocument != NULL) 
   L_OcrDocument_Destroy(ocrDocument); 
if(ocrEngine != NULL) 
   L_OcrEngine_Destroy(ocrEngine);

Download our FREE evaluation

Help Version 21.0.2021.7.2

Products | Support | Contact Us | Intellectual Property Notices
© 1991-2021 Apryse Sofware Corp. All Rights Reserved.

LEADTOOLS OCR Module - LEAD Engine C API Help

Introduction

Version History

Redistributables/Files To Be Included With Your Application

LEADTOOLS OCR Features

Quick Reference

Tutorials

Function References