This topic and its replies were posted before the current version of LEADTOOLS was released and may no longer be applicable.
#1
Posted
:
Saturday, July 14, 2007 3:54:48 PM(UTC)
Groups: Registered
Posts: 23
Is there any perferred image format that performs bettter OCR'ing than another. Would PNG be better then TIFF. Would smaller file sizes run faster? Does one format produce more accurate results? I have seen that the optimal resolution is 300dpi and to have the image size as large as possible, but I haven't seem anything on image formats.
#2
Posted
:
Sunday, July 15, 2007 11:11:36 PM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
The higher the resolution value
the better the image quality, which yields a better OCR performance. The
Downside to this is that, the image size gets bigger with the increase in
resolution.
We always recommend feeding the
OCR engine images saved with the following specifications:
1- High resolution (300 DPI is
good).
2- Saved as 1-bit (black and
white) mode.
3- Saved in a lossless format, such as LZW TIFF or
CCITT Group 4 TIFF.
#3
Posted
:
Monday, July 16, 2007 10:34:06 AM(UTC)
Groups: Registered
Posts: 23
Would JBIG2 work too? It looks like other OCR engines use this format. I looked in the Raster Image Help file and there is an example of the properties of the JBIG2 codec but there is no glossary of what each property does. Could I get a glossary of the properties and also has Leadtools have any examples of how to use the JBIG2 w/ optimal properties for the OCR engine? See CVISION's product for usage of JBIG2 w/ OCR.
#4
Posted
:
Monday, July 16, 2007 11:44:01 AM(UTC)
Groups: Registered, Tech Support, Administrators
Posts: 764
Any image format is really fine. LEADTOOLS works with uncompressed image data, therefore the input format really doesn't matter that much. Qasem only recommended those formats because they are lossless, therefore if you save them and reopen them later, they will look exactly the same. If you use something like JPEG, repeated load/save combinations will eventually degrade the data. The image format you use is completely subjective and depends on what you want to do.
The main factor in quesion is quality and clarity, and on that note 300 DPI and lossless are really the only things that are a "solid" benchmark for optimal OCR images.
#5
Posted
:
Monday, July 16, 2007 1:19:35 PM(UTC)
Groups: Registered
Posts: 23
Greg,
I completely understand that you definatly need 300dpi and lossless images for OCR. I was reading on JBIG2 specs and the format does some font replacement that is supposed to help out in recognition as seen in CVISION's OCR tools. I am just stumped at trying to find in the Leadtools documention what the codecs.options.jbig2.save. properties for JBIG2 are? I see an example of how to use some of them, but it doesn't tell me what each property is doing. Is there full documentation on the JBIG that isn't in the rasterimaging help files?
Thanks,
Dave
#6
Posted
:
Monday, July 16, 2007 10:26:38 PM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
The only
documentation we have is in the help file. The help topic "CodecsJbig2SaveOptions
Class Members" lists all related properties, and contains links to their
help topics.
#7
Posted
:
Friday, July 20, 2007 2:19:22 PM(UTC)
Groups: Registered
Posts: 23
Sorry, I was clicking from the JBIG2 index and was clicking on the codecs link inside and didn't get to this page. Going directly to the CodecsJbig2SaveOptions Class Members index worked !!! Thanks so much
DAVE
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.