The LEADTOOLS OCR features provide methods for incorporating optical character recognition (OCR) technology into an application. OCR is used to process bitmap document images into text.
Once the LEADTOOLS C API OCR toolkit is installed to the system, the user is ready to begin programming with LEADTOOLS OCR. Please note that the OCR features must be unlocked before the user can actually use the OCR properties, methods, and events. For more information on unlocking LEAD features, refer to Setting a Runtime License.
LEADTOOLS provides methods to:
The following is an outline of the general steps involved in recognizing one or more pages.
1. Select the engine type you wish to use and create the OCR engine using L_OcrEngineManager_CreateEngine method.
2. Startup the OCR Engine with the L_OcrEngine_Startup method. For more information, refer to Starting and Shutting down the Engine.
3. Create an OCR page by loading a bitmap first and then pass the bitmap handle to the L_OcrPage_FromBitmap method.
4. Establish zones on the page(s), either manually or automatically. (This is optional. A page can be recognized with or without zones.) For more information, refer toWorking with OCR Zones.
5. Optional. Set the active languages to be used by the OCR engine. (The default is English). For more information, refer to Working with OCR Languages.
6. Recognize. For more information, refer to Recognizing OCR Pages.
7. Create OCR document (File or Memory based document) and add the recognized page into it in order to be saved later.
8. Save recognition results, if desired. For more information, refer to Recognizing OCR Pages.
9. Shut down the OCR engine when finished by calling L_OcrEngine_Shutdown or L_OcrEngine_Destroy.
Steps 4 and 5 can be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page.
The following example shows how to perform the above steps in code:
BITMAPHANDLE bitmap = { 0 };
L_OcrEngine ocrEngine = NULL;
L_OcrPage ocrPage = NULL;
L_OcrDocumentManager ocrDocumentManager = NULL;
L_OcrDocument ocrDocument = NULL;
// Create an instance of the engine
L_INT retCode = L_OcrEngineManager_CreateEngine(L_OcrEngineType_Advantage, &ocrEngine);
if(retCode != SUCCESS)
return retCode;
// Start the engine using default parameters
retCode = L_OcrEngine_Startup(ocrEngine, NULL, L_TEXT("C:\\LEADTOOLS 19\\Bin\\Common\\OcrAdvantageRuntime"));
if(retCode != SUCCESS)
goto CLEANUP;
// Load an image to process
retCode = L_LoadBitmap(L_TEXT("C:\\Users\\Public\\Documents\\LEADTOOLS Images\\Ocr1.tif"), &bitmap, sizeof(BITMAPHANDLE), 0, ORDER_RGB, NULL, NULL);
if(retCode != SUCCESS)
goto CLEANUP;
// Add an image to OCR page. Transfer ownership of the bitmap to the page
retCode = L_OcrPage_FromBitmap(ocrEngine, &ocrPage, &bitmap, L_OcrBitmapSharingMode_AutoFree, MyOcrProgressCallback, NULL);
if(retCode != SUCCESS)
goto CLEANUP;
// We have a valid page and bitmap ownership has transferred. So, we do not need to free the bitmap anymore.
// Bitmap will be freed when ocrPage is destroyed.
bitmap.Flags.Allocated = 0;
// Automatically find areas/zones on the page where text is located
retCode = L_OcrPage_AutoZone(ocrPage, MyOcrProgressCallback, NULL);
if(retCode != SUCCESS)
goto CLEANUP;
// Recognize the page
// Note: Recognize can be called without calling AutoZone or manually adding zones.
// The engine will check and automatically auto-zone the page
retCode = L_OcrPage_Recognize(ocrPage, MyOcrProgressCallback, NULL);
if(retCode != SUCCESS)
goto CLEANUP;
retCode = L_OcrEngine_GetDocumentManager(ocrEngine, &ocrDocumentManager);
if(retCode != SUCCESS)
goto CLEANUP;
// Create an OCR document
retCode = L_OcrDocumentManager_CreateDocument(ocrDocumentManager, &ocrDocument, L_OcrCreateDocumentOptions_AutoDeleteFile, NULL);
if(retCode != SUCCESS)
goto CLEANUP;
// In Document File Mode, add OcrPage to OcrDocument after recognition
retCode = L_OcrDocument_AddPage(ocrDocument, ocrPage);
if(retCode != SUCCESS)
goto CLEANUP;
// Adding the page to a file based document will take a snap shot of the recognition data and store it in the document. At this
// point, the page is no longer needed. So destroy it to free up memory not used anymore
L_OcrPage_Destroy(ocrPage);
// Set the handle to NULL so we do not free it in our clean-up code
ocrPage = NULL;
// Save the document we have as PDF
retCode = L_OcrDocument_Save(ocrDocument, L_TEXT("C:\\Users\\Public\\Documents\\LEADTOOLS Image\\Ocr1.pdf"), DOCUMENTFORMAT_PDF, NULL, NULL);
CLEANUP:
if(bitmap.Flags.Allocated)
L_FreeBitmap(&bitmap);
if(ocrPage != NULL)
L_OcrPage_Destroy(ocrPage);
if(ocrDocument != NULL)
L_OcrDocument_Destroy(ocrDocument);
if(ocrEngine != NULL)
L_OcrEngine_Destroy(ocrEngine);