Pre-processing Images for OCR

(Document/Medical) Scanned documents are not always straight. They can be skewed by the scanner feeder or get inserted in the wrong direction (upside down). For the best possible results such images need correction before being processed by the OCR engine. LEADTOOLS provides a number of pre-processing command classes that can be used to correct the orientation of images. These include the ImageProcessing.Core.DeskewCommand, ImageProcessing.Core.SearchRegistrationMarksCommandData, ImageProcessing.Core.PerspectiveDeskewCommand, and ImageProcessing.Core.ManualPerspectiveDeskewCommand classes.

Pre-processing bitonal (1-Bit) images

For more information about preprocessing bitonal images refer to Cleaning Up 1-Bit Images.

Pre-processing color images

Sometimes documents are scanned in color. Color images can be adjusted or enhanced using various LEADTOOLS commands. For more information, refer to Leadtools.ImageProcessing.Color.

A more practical solution is to convert the image to a bitonal (1-bit) image before processing in order to achieve better results. However, converting color images to a 1-bit image can cause loss of some important features due to the lack of colors in the 1-bit image. Convert color images to bitonal images by using either the ColorResolutionCommand, IntensityDetectCommand or the DynamicBinaryCommand class. ColorResolutionCommand uses a fixed intensity threshold value of 128 to convert the image to black and white. IntensityDetectCommand uses a user specified intensity range to convert the image to black and white. Colors inside the specified range are mapped to white color while colors outside the range are mapped to black color. DynamicBinaryCommand uses a dynamically-calculated threshold to convert the image to black and white. If the intensity of the pixel is higher than the dynamic threshold, the pixel will be set to white; otherwise, it is set to black. These commands give the user the ability to handle color images of different qualities for OCR.

The choice of the best color conversion method is application dependent. For example, on passport images, the information text is written in black color over textured background. The text color is very dark in comparison to the textured background. To get the best OCR results on these images, it is recommended to use low intensity threshold value to segment text pixels from background pixels. This segmentation process is performed by IntensityDetectCommand with a range that starts from low intensity value and ends with 255, e.g., (100 to 255). The low intensity threshold value is empirically estimated by examining the distribution of text colors and background colors on these images, then finding the intersection point between their normal distributions.

For a tutorial using the Advantage OCR demo, see How to OCR ID Document Images such as Passports Using Advantage OCR

Auto Binarize

The most sophisticated way to convert color images to bitonal images is by using the AutoBinarizeCommand. The AutoBinarizeCommand uses several pre-processing and threshold operations in order to maintain the key features of a colored image. The AutoBinarizeCommand can be adjusted to the specific input device used, such as a scanner or camera. It is perfect for making unclear document images more readable.

The following example demonstrates how the AutoBinarizeCommand can be used to detect text with excellent results:

Before

After

The AutoBinarizeCommand class works automatically, but also allows customization of how the algorithm works. Options include whether to perform internal processing on the image. You can also choose which threshold method to use.