How can I recognize blurry documents?

#1 Posted : Tuesday, April 18, 2017 9:12:18 AM(UTC)

Nick

Groups: Registered, Tech Support, Administrators
Posts: 163

Was thanked: 9 time(s) in 9 post(s)

Sometimes noise present in a document can cause OCR to either recognize a document incorrectly, or not recognize certain words or characters at all. Do note that while OCR technology does its best to recognize glyphs representing individual characters, it does lack the capabilities humans have of inferring indecipherable characters from the context of its surroundings.

That said, there are certain image preprocessing commands which can be used to detect blur in an image, and remove it if found. The BlurDetectionCommand can be used to determine the amount of blur present in an image, then the MaximumCommand can be used to subsequently erode the image and remove extraneous pixels. Here's a quick code snippet which shows how to do these in tandem.

Code:


RasterImage image = codecs.Load(@"input.tif");
BlurDetectionCommand blurDetectionCommand = new BlurDetectionCommand();
blurDetectionCommand.Run(image);

if (blurDetectionCommand.Blurred)
{
    MaximumCommand maximumCommand = new MaximumCommand(3);
    maximumCommand.Run(image);
}

Here's documentation links to the RasterCommands used.
https://www.leadtools.com/help/sdk/dh/po/blurdetectioncommand.html
https://www.leadtools.com/help/sdk/dh/po/maximumcommand.html

Here's some text which is initially difficult to process for OCR due to the excessive pixels with some of them eroded away using the MaximumCommand as described.

Edited by moderator Wednesday, December 27, 2023 3:21:15 PM(UTC) | Reason: Updated

Nick Crook
Developer Support Engineer
LEAD Technologies, Inc.


	Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads Wanna join the discussion? Login to your LEADTOOLS Support account or Register a new forum account.

Notification

How can I recognize blurry documents?