This topic and its replies were posted before the current version of LEADTOOLS was released and may no longer be applicable.
#1
Posted
:
Monday, February 25, 2008 12:44:16 PM(UTC)
Groups: Registered
Posts: 5
Hello all.
I have just started discovering the Leadtool OCR utility, so far with the OCR_Util application. I have some observations concerning it:
1. I have noticed that it is impossible to recognize a light text on a dark background. For instance inverting colors in one of the example ocr images makes the text unrecognizable. Can this be alleviated by some parameter settings? If not, what can be done about it?
2. Sometimes, when loading an image there is an error: "Can't add page to engine, Error = -1239", which according to documentation means "Non-supported resolution". The image was a bitmap, size 400 x 300, with a single sentence in black on white background , font size about 24 pt. Curiously, when the .bmp file was changed to .jpg, there was no problem in loading it.
What are possible reason of such an error?
Best regards,
Gutek
#2
Posted
:
Tuesday, February 26, 2008 4:18:24 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
You are correct. Our OCR engine can only read dark characters on a light background.
About the second issue, please send me the image in a ZIP or RAR file and I will test it for you. You can either post the files here or send them to me to
support@leadtools.com
#3
Posted
:
Tuesday, February 26, 2008 7:37:31 AM(UTC)
Groups: Registered
Posts: 5
Hello,
Thanks for reply. I attach the bitmap.
Concerning the background issue, what can be done if we do not know a priori what kind of contrast between the text and the background will be? For instance, we can expect a yellow text on a blue bground or some other combination (dark bg possible) - easy to distinguish for an eye, but more difficult for an OCR engine?
#4
Posted
:
Wednesday, February 27, 2008 5:16:26 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
First of all, I'm sorry I gave
you an incorrect answer.
I tested some more and found
out that the engine can actually recognize text from inverted parts. I'm
attaching a sample TIFF that I tested with and worked. If you have sample
images that don't work the same way, also send them over and we will check them
for you.
About the image you sent, the resolution stored in it
is 72 DPI. This is very low. I changed it and saved the images back, it seems
to be working. I'm also attaching the modified image.
#5
Posted
:
Saturday, March 1, 2008 9:56:59 AM(UTC)
Groups: Registered
Posts: 5
Hi again,
The new resolution image can be recognized. However, I wonder what difference does the resolution make if we are working with digital data? The old one (72dpi) and the new one (150 dpi) are in fact the same. I believe that dpi indication is only suitable for printing - are there any reasons to take it into consideration in OCR?
As far as white text on black bg is concerned, the inverted version of the image you sent me (150dpi) can't be recognized (attached). The Licence Agreement sample with inverted paragraph could be recognized though.
It seems that light on dark recognition, even if possible, is not reliable. For a tool, which should work automatically for different kinds of text without human surveillance, I think it is better to perform some image preprocessing (like invertion, contrast enhancement) so to enable recognition.
Best regards,
Gutek
#6
Posted
:
Sunday, March 2, 2008 6:29:00 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
I'm afraid that the engine will not work with these type of images and there is little that we can do about it. However, Since the inverted image does work and produces correct results, you can use LEADTOOLS to invert images and to check if the image is inverted or not using some image processing functions.
#7
Posted
:
Sunday, March 2, 2008 10:48:35 AM(UTC)
Groups: Registered
Posts: 5
Yes, that's what I was going to do.
What about the "high dpi" issue? Are there any reasons for not allowing low dpi images to be loaded?
Gutek
#8
Posted
:
Monday, March 3, 2008 5:38:27 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
Gutek,
The OCR engine internally uses the DPI information
when it attempts to figure out what characters correspond to the shapes in the
image. It has a limitation of not properly working with low resolution.
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.