How to OCR ID Document Images such as Passports Using Advantage OCR

This topic describes how to OCR ID document images using Advantage OCR to produce the best recognition results. ID documents, such as passports, may have textured background. This texture is highly likely to interfere with segmented text for OCR, leading to undesirable results.

One of the best approaches to handle color ID document images for OCR is to convert them to black and white using the thresholding method with a low intensity threshold value. The text color is very dark in comparison to the textured background. The low intensity threshold value is empirically estimated by examining the distribution of text colors and background colors on these images, then by finding the intersection point between their normal distributions. To perform this thresholding method using OCR Advantage demo follow these steps:

Run OCR Advantage demo and in the menu bar go to Engine > Settings > Recognition > Preprocess settings > black and white conversion method and set its value to user.
Go to Engine > Settings > Recognition > Preprocess settings > black and white conversion threshold and set a low threshold value, for example, 110.
Then open an ID document image (File > Open).
And finally, perform the OCR task.

The steps outlined in this tutorial optimize the handling of color ID document images for OCR by converting them to black and white in order to generate the best OCR results. It is recommended when processing images with OCR Advantage to use OCR zone with type “FieldData” to recognize the person’s information regions in the ID document images. This zone type is limited to all English capital letters, digits, and - , . / symbols.