Disable word hyphenation reagrupation.

Options

View

Last

Unread

Previous Topic Next Topic

This topic and its replies were posted before the current version of LEADTOOLS was released and may no longer be applicable.

#1 Posted : Tuesday, January 16, 2007 9:28:40 AM(UTC)

mcastro

Groups: Registered
Posts: 4

Hy.
Is there any way to disable/ switch of word hyphenation regrouping ?.

I have developed a grouping characters algorithm inside words using GetRecognizedCharacters and GetRecognizedWords thanks to the RECT member of each one and it works as required.

But when reconizing words that end with '-' the OCR engine uses some kind of hyphenation regrouping and builds a word with a big rect between the hyphen and the next word it finds. As a result the RECT of this word is not properly configured as there are other characters of other words inside it.

I will try to show it with the attached image.

it is a true scenario!.

Best Regards.

mcastro attached the following image(s):


	Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads Wanna join the discussion? Login to your LEADTOOLS Support account or Register a new forum account.

#2 Posted : Tuesday, January 16, 2007 11:29:45 PM(UTC)

Bashar

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

Which version of LEADTOOLS are you using? Can you please post a
sample document that shows this issue along with the resulting
recognized data? Are you able to reproduce the issue with the OCR Demo that ships with LEADTOOLS?

#3 Posted : Thursday, January 25, 2007 8:17:49 AM(UTC)

mcastro

Groups: Registered
Posts: 4

I am using leadtools version 14.5 R.

Here I attach a actual image of one of the images that broke my RECT algorithms.

I have used the software OCRUTL32.exe that comes with leadtools.

With this application after the recognition of the document, you can check the results of 'get recognized words' where you can find the strange behavior.

You will find there a word like -A-SUBSCRIPTION, despite this words are on separate regions, and at very different Y coordinate.

I hope this helps.

Best regards.

mcastro attached the following image(s):

#4 Posted : Friday, January 26, 2007 7:18:45 AM(UTC)

Jordan_S

Groups: Registered
Posts: 7

I posted a similar problem a few months back (though I can't seem to find the post... maybe it was deleted). However, it was dealing with OCR returning a recognized word that ends in a hyphen (Leadtools.Ocr.RasterOcrRecognizedWords) and having its bounds (top, bottom, left and right) spread across two lines. So, instead of getting the actual bounds of the rectangle around that word, I get the area of two full lines.

Apparently, their OCR considers hyphens at the end of a line (or what it sees as a line) an indication of a continuation on the next recognized line.

I was never able to resolve the problem. Any progress in this area would be appreciated.

#5 Posted : Sunday, January 28, 2007 6:45:02 AM(UTC)

Bashar

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

Yes, I got the same thing. I solved it by setting the Parser to
Legacy and by enabling the "Enable Force Single Column" option when
doing a Find Zones operation. In code, this can be accomplished
by setting the ZoneParser property to PARSER_LEGACY and EnableZoneForceSingleColumn to True then calling SetAutoZoneOptions.

You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.