#1
Posted
:
Tuesday, April 18, 2017 9:12:18 AM(UTC)
Groups: Registered, Tech Support, Administrators
Posts: 163
Was thanked: 9 time(s) in 9 post(s)
Sometimes noise present in a document can cause OCR to either recognize a document incorrectly, or not recognize certain words or characters at all. Do note that while OCR technology does its best to recognize glyphs representing individual characters, it does lack the capabilities humans have of inferring indecipherable characters from the context of its surroundings.
That said, there are certain image preprocessing commands which can be used to detect blur in an image, and remove it if found. The BlurDetectionCommand can be used to determine the amount of blur present in an image, then the MaximumCommand can be used to subsequently erode the image and remove extraneous pixels. Here's a quick code snippet which shows how to do these in tandem.
Code:
RasterImage image = codecs.Load(@"input.tif");
BlurDetectionCommand blurDetectionCommand = new BlurDetectionCommand();
blurDetectionCommand.Run(image);
if (blurDetectionCommand.Blurred)
{
MaximumCommand maximumCommand = new MaximumCommand(3);
maximumCommand.Run(image);
}
Here's documentation links to the RasterCommands used.
https://www.leadtools.com/help/sdk/dh/po/blurdetectioncommand.htmlhttps://www.leadtools.com/help/sdk/dh/po/maximumcommand.htmlHere's some text which is initially difficult to process for OCR due to the excessive pixels with some of them eroded away using the MaximumCommand as described.
Edited by moderator Wednesday, December 27, 2023 3:21:15 PM(UTC)
| Reason: Updated
Nick Crook
Developer Support Engineer
LEAD Technologies, Inc.
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.