OCR Frequently Asked Questions

1. How do I define multiple zones on an image, OCR them all at once, and get the results for each zone separately so that I can save it to a database?

 Answer:

There is an example in the OCR (OCRutil) demo. You can open the project to see the source code for the demo. To run the demo:

1.

Load an image.

2.

Click Page | Insert Current Page.

3.

Either click and drag to create zones manually then click OCR | Recognize Page..., or call OCR | Recognize Page.... This will OCR all zones on the page.

4.

Once data is recognized, click OCR | Get Recognized Words to retrieve each word from each zone.

If you want to get results for each zone separately, you should have only one zone at recognition time. But if you have more than one zone, then the save results function will save all zones results to one file. Also, if you need to get each zone result, you can call L_DocGetRecognizedCharacters function, then the ppRecogChars will be updated, and you can collect all characters from same zone while ppRecogChars->nZoneIndex member is not changed.

2. How do I define, load, and save templates for different types of forms?

 Answer:

There is an example in the OCR (OCRutil) demo. You can open the project to see the source code for the demo. To run the demo:

1.

Load an image.

2.

Click Page | Insert Current Page.

3.

Either click and drag to create zones manually or call Zone | Find Zones .... This will Create zones on the page automatically.

4.

Once the image is zoned, click Zone | Export Zone File... to save the zones to file, or Zone | Load Zone File... to load the zones from disk.

3. How do I support various European languages?

 Answer:

There is an example in the OCR (OCRutil) demo. You can open the project to see the source code for the demo. To run the demo:

Choose Language | Select Languages.

4. How do I get a confidence value for each character that is recognized?

 Answer:

Please refer to the following topic on the LEADTOOLS Support Forum:

http://support.leadtools.com/SupportPortal/cs/forums/9608/ShowPost.aspx

You can call L_DocGetRecognizedCharacters, and check nConfidence member from RECOGCHARS structure for each character.

5. How do I filter the OCR recognition results to eliminate false characters and increase accuracy?

Answer:

You can filter out false positives by setting the character filter in the ZONEDATA structure. For example, if you wish to recognize only numbers, you would set the character filter to recognize numbers only as follows:

 

ZONEDATA ZoneData;
memset(&ZoneData, 0, sizeof(ZONEDATA));
ZoneData.uStructSize = sizeof(ZONEDATA);
ZoneData.rcArea.left = 100;
ZoneData.rcArea.top = 100;
ZoneData.rcArea.right = 200;
ZoneData.rcArea.bottom = 200;
ZoneData.FillMethod = FILL_DEFAULT;
ZoneData.RecogModule = RECOGMODULE_AUTO;
ZoneData.CharFilter = ZONE_CHAR_FILTER_NUMBERS;
ZoneData.Type = ZONE_FLOWTEXT;
ZoneData.uFlags = 0;
ZoneData.pfnCallback = VerificationCB;
ZoneData.pUserData = NULL;
nRet = L_DocAddZone(hDoc, nPageIndex, 0, &ZoneData);

You can call L_DocUpdateZone to update available zone in a specific page.

6. How do I get the co-ordinates of each word recognized, so that I can locate each recognized word on the image?

 Answer:

Please refer to the following topic on the LEADTOOLS Support Forum:

http://support.leadtools.com/SupportPortal/cs/forums/2788/ShowPost.aspx

You can call L_DocGetRecognizedCharacters to get each recognized characters, also you can call L_DocGetRecognizedWords to get list of all recognized words.

7. How do I output OCR results to memory?

 Answer:

There is an example in the OCRMem demo. You can open the project to see the source code for the demo. The OCRMem demo will save recognition results to memory. To run the demo:

image\sqrblit.gif Run the OCR Memory demo

image\sqrblit.gif Click File | Open menu, and select your file to open

image\sqrblit.gif Click OCR | Add Page menu, to add loaded image to internal OCR document pages.

image\sqrblit.gif Click OCR | Recognize menu, to recognize the added page.

image\sqrblit.gif Click OCR | Save Results to Memory, then the demo will show all recognition results in a message box.

8. How do I output OCR results to XML?

 Answer:

There is an example in the OCR (OCRutil) demo. You can open the project to see the source code for the demo. To run the demo:

1.

Load an image.

2.

Click Page | Insert Current Page.

3.

Either click and drag to create zones manually then click OCR | Recognize Page..., or call OCR | Recognize Page.... This will OCR all zones on the page.

4.

Once data is recognized, click OCR | Save Results. Select XML in the File Formats ComboBox, choose a file name, then click Ok.

9. How do I output OCR results to image-over-text PDF?

 Answer:

There is an example in the OCRFile demo. This demo saves to many different formats, one of which is image-over-text PDF. You can open the project to see the source code for the demo.

The OCRFile demo saves the recognition result to all supported output formats. Choose the desired output format.

10. How do I add user-defined words to the recognition library?

 Answer:

Please refer to the help topic: Working with a Dictionary

11. How do I recognize MICR code on check images?

 Answer:

There is an example in the MICR demo. You can open the project to see the source code for the demo. To run the demo:

1.

Run the MICR demo

2.

Make sure that the zone coordinates that will be added to the page are based on "MICR_SAMPLE.tif" (shipped with the LEADTOOLS setup).

3.

When loading other images, update the zone coordinates and build the demo again.

See Also

OCR API Demos

Sample Programs