Cleaning Up Color Images with LEADTOOLS Document Imaging

Posted on 2016-10-07 10:49:40 by Greg

One of the most foundational features in document imaging is image cleanup (also called preprocessing). When paper documents are scanned to digital form there are almost always imperfections. The paper can be at an angle, hole punches leave large black dots, folded paper introduces lines, and at the very least dust speckles litter small, dark dots throughout the image. All of these can have an adverse trickle-down effect on many other algorithms such as OCR, Forms, Barcode, Compression and more.

There is one caveat with most document imaging libraries: the document images must be black and white. While technically true for LEADTOOLS as well, it's not a limitation whatsoever. Each of the LEADTOOLS document cleanup functions return information on what it has done. For example, you can get the deskew angle, rectangle to crop, or region to fill and then apply those same operations on a color image:

Continue Reading...

Enhanced OCR Noise Removal Coming Soon

Posted on 2013-03-12 17:02:22 by Greg

While making my rounds through the engineering department, the OCR team showed me some really impressive enhancements to the Advantage OCR engine coming soon. They've accomplished a lot, but my personal favorite is what they've done to the Advantage OCR engine's preprocessing algorithm. With much sweat, tears and coffee, they've fine-tuned the noise removal algorithm with impressive results. Other engines may have difficulty seeing between the lines (literally) when forms and documents use separator bars or boxes for individual characters. LEADTOOLS Advantage OCR Engine is doing a superb job at intelligently removing the noise and returning only the text of interest, rather than getting hung up on bars, dashes, speckles and other types of noise that should simply be ignored.



Other than the obvious benefit of improved accuracy, this is especially helpful for customers using forms recognition where character separators are prevalent. On documents where it might have been necessary to use a separate zone for each character and piece them together post-recognition, now only a single zone is needed since the separator bars and cells will no longer be taken into account.

Continue Reading...
LEADTOOLS Blog

LEADTOOLS Powered by Apryse,the Market Leading PDF SDK,All Rights Reserved