This topic and its replies were posted before the current version of LEADTOOLS was released and may no longer be applicable.
#1
Posted
:
Thursday, February 9, 2012 9:05:40 AM(UTC)
Groups: Registered
Posts: 4
good afternoon
I am using the "LEADTOOLS" version 17.5 of the Professional OCR.
I am having the following problem with some documents, I send examples of documents and OCR results.
What strikes me is this, the top line meets the bottom line by inserting the characters on both lines. Is there a way by which you can modify the default line spacing OCR engine?
I have found that this problem only happens to me with such documents.
The tests are performed with VBOcrMainDemo.exe
a greeting
Jesus Lizon
#2
Posted
:
Friday, February 10, 2012 6:25:18 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
After checking this with our main OCR demo in .NET using the professional OCR, the attached text file is my results with formatting on. Turning formatting off returns the text as it is found on the document, this appears to be a little cleaner overall, but it may not be what you're looking for.
However, I did not receive the same results. You may want to redownload the installer and reinstalling LEADTOOLS 17.5 to ensure you have the latest versions. Additionally can you give me your OS version. I would like to verify there is no connection there.
Attached Files:
Results.txt - OCR with formatting.
ResultsNOFORMAT.txt - OCR without formatting.
Thanks,
Danny Helms
#3
Posted
:
Friday, February 10, 2012 8:20:03 AM(UTC)
Groups: Registered
Posts: 4
good afternoon
My system is Windows XP operatic service pack 3
In the previous post does not explain that when I export to pdf the result is correct, however if I export to txt if it goes wrong the result.
Where do you download the latest version?
A greeting and thanks
#4
Posted
:
Monday, February 13, 2012 1:15:44 AM(UTC)
Groups: Registered
Posts: 4
Good morning Danny Helms
I performed the upgrade to the latest version of LEADTOOLS.
Experiment with the image you attached, and the result is better than previous tests, but not correct. Some lines still appear together. I converted 1-bit image and OCR main demo in. NET and the result saved in text and formatting
attached image and result
A greeting and thanks
#5
Posted
:
Monday, February 13, 2012 4:11:58 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
The causes of your imperfections both formatting and character related is mostly due to the size and resolution of the image and it's quallity. The image quality you're using is not in the greatest shape so the output is also not expected to be so. The text formatting issues you're seeing would be more likely to correct themselves as the image size gets larger. If increasing the image size does not resolve the issue then you would need to manually zone the documents, most likely.
I must ask though, what are you using the text document output for which makes the formatting so crucial?
#6
Posted
:
Tuesday, February 14, 2012 1:06:00 AM(UTC)
Groups: Registered
Posts: 4
Good Morning
The format text, use it to read the result in another program. Would it change the result if I use another format?
a Greeting
#7
Posted
:
Tuesday, February 14, 2012 4:12:59 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
Given that you want to more easily read the content from another application, I would say text is probably you're most suitable option. That or pushing to XML and parsing the character data, but that is going to return significantly more detail information. And without LEADTOOLS in your other application you would need to account for the extra information, such as the character bounds.
More importantly is I just tested your most recent uploaded images with the Native engine output of a formatted text file in professional engine. Attached are my results. You should see they are fine. There are some ways we may be able to get the normal text files outputting better, but if this is all you need I'd suggeset doing this as opposed to the alternatives.
http://www.leadtools.com/help/leadtools/v175/dh/fo/leadtools.forms.ocr~leadtools.forms.ocr.iocrdocumentmanager.html
The above documentation link points to a code example where the native engine format is used. The example actually saves output to all available formats, obviously this is not the case for you.
You may need to double check that appropriate string, but it apears it should be AsciiTextFormatted or UnicodeTextFormatted. Either you prefer.
You can set that string to the IOcrDocumentManager.EngineFormat to make it active, then use DocumentFormat.User to specify you want to use the Native formats.
Instructions from documentation:
To save the recognition results using the engine native format:
-Obtain the engine native format name using the GetSupportedEngineFormats.
-Set the engine format name with the EngineFormat property
-Call the various save methods (IOcrDocument.Save or IAutoRecognizeManager.Run) using DocumentFormat.User for the format parameter.
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.