Welcome Guest! To enable all features, please Login or Register.

Notification

Error

jlizon

Options

View

Last

Unread

Previous Topic Next Topic

This topic and its replies were posted before the current version of LEADTOOLS was released and may no longer be applicable.

#1 Posted : Thursday, February 9, 2012 9:05:40 AM(UTC)

jlizon

Groups: Registered
Posts: 4

good afternoon

I am using the "LEADTOOLS" version 17.5 of the Professional OCR.

I am having the following problem with some documents, I send examples of documents and OCR results.

What strikes me is this, the top line meets the bottom line by inserting the characters on both lines. Is there a way by which you can modify the default line spacing OCR engine?

I have found that this problem only happens to me with such documents.

The tests are performed with VBOcrMainDemo.exe

a greeting
Jesus Lizon

File Attachment(s):

ejemplo.rar (259kb) downloaded 33 time(s).


	Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads Wanna join the discussion? Login to your LEADTOOLS Support account or Register a new forum account.

#2 Posted : Friday, February 10, 2012 6:25:18 AM(UTC)

Danny H

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

After checking this with our main OCR demo in .NET using the professional OCR, the attached text file is my results with formatting on. Turning formatting off returns the text as it is found on the document, this appears to be a little cleaner overall, but it may not be what you're looking for.

However, I did not receive the same results. You may want to redownload the installer and reinstalling LEADTOOLS 17.5 to ensure you have the latest versions. Additionally can you give me your OS version. I would like to verify there is no connection there.

Attached Files:
Results.txt - OCR with formatting.
ResultsNOFORMAT.txt - OCR without formatting.

Thanks,
Danny Helms

File Attachment(s):

Results.zip (1kb) downloaded 33 time(s).

#3 Posted : Friday, February 10, 2012 8:20:03 AM(UTC)

jlizon

Groups: Registered
Posts: 4

good afternoon

My system is Windows XP operatic service pack 3

In the previous post does not explain that when I export to pdf the result is correct, however if I export to txt if it goes wrong the result.

Where do you download the latest version?

A greeting and thanks

#4 Posted : Monday, February 13, 2012 1:15:44 AM(UTC)

jlizon

Groups: Registered
Posts: 4

Good morning Danny Helms

I performed the upgrade to the latest version of LEADTOOLS.

Experiment with the image you attached, and the result is better than previous tests, but not correct. Some lines still appear together. I converted 1-bit image and OCR main demo in. NET and the result saved in text and formatting

attached image and result

A greeting and thanks

File Attachment(s):

pruebas.rar (7kb) downloaded 34 time(s).

#5 Posted : Monday, February 13, 2012 4:11:58 AM(UTC)

Danny H

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

The causes of your imperfections both formatting and character related is mostly due to the size and resolution of the image and it's quallity. The image quality you're using is not in the greatest shape so the output is also not expected to be so. The text formatting issues you're seeing would be more likely to correct themselves as the image size gets larger. If increasing the image size does not resolve the issue then you would need to manually zone the documents, most likely.

I must ask though, what are you using the text document output for which makes the formatting so crucial?

#6 Posted : Tuesday, February 14, 2012 1:06:00 AM(UTC)

jlizon

Groups: Registered
Posts: 4

Good Morning

The format text, use it to read the result in another program. Would it change the result if I use another format?

a Greeting

#7 Posted : Tuesday, February 14, 2012 4:12:59 AM(UTC)

Danny H

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

Given that you want to more easily read the content from another application, I would say text is probably you're most suitable option. That or pushing to XML and parsing the character data, but that is going to return significantly more detail information. And without LEADTOOLS in your other application you would need to account for the extra information, such as the character bounds.

More importantly is I just tested your most recent uploaded images with the Native engine output of a formatted text file in professional engine. Attached are my results. You should see they are fine. There are some ways we may be able to get the normal text files outputting better, but if this is all you need I'd suggeset doing this as opposed to the alternatives.

http://www.leadtools.com/help/leadtools/v175/dh/fo/leadtools.forms.ocr~leadtools.forms.ocr.iocrdocumentmanager.html

The above documentation link points to a code example where the native engine format is used. The example actually saves output to all available formats, obviously this is not the case for you.

You may need to double check that appropriate string, but it apears it should be AsciiTextFormatted or UnicodeTextFormatted. Either you prefer.

You can set that string to the IOcrDocumentManager.EngineFormat to make it active, then use DocumentFormat.User to specify you want to use the Native formats.

Instructions from documentation:

To save the recognition results using the engine native format:

-Obtain the engine native format name using the GetSupportedEngineFormats.
-Set the engine format name with the EngineFormat property
-Call the various save methods (IOcrDocument.Save or IAutoRecognizeManager.Run) using DocumentFormat.User for the format parameter.

You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.