This topic and its replies were posted before the current version of LEADTOOLS was released and may no longer be applicable.
#1
Posted
:
Monday, February 22, 2010 8:36:59 PM(UTC)
Groups: Registered
Posts: 4
Hi,
I have problems when trying to parse OCR results from Table type Zone.
My goal is build table with columns and rows.
Here is what I got:
1. When using OCR Plus engine each last character of the word/cell has following Position information:
OcrCharacterPosition.EndOfLine | OcrCharacterPosition.EndOfParagraph | OcrCharacterPosition.EndOfWord | OcrCharacterPosition.EndOfCell
This does not allow me even differentiate lines in the table.
2. When using OCR Professional engine each last character of the word/cell has following Position information:
OcrCharacterPosition.EndOfLine | OcrCharacterPosition.EndOfWord | OcrCharacterPosition.EndOfCell
As with OCR Plus engine - useless information to build the table.
Do you have any ideas to solve the problem?
Thanks in advance
#2
Posted
:
Tuesday, February 23, 2010 4:20:40 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
Do you have a sample file that you tried to perform OCR on it and draw a table from it?
If yes, please send it to us and explain what type of attributes you want to detect and we will try to tell you if that's possible or no.
If you want to submit an attachment, put it in a ZIP or RAR file and don't use the Preview feature. You can also send it in an email to
support@leadtools.com and mention this forum post.
#3
Posted
:
Tuesday, February 23, 2010 5:38:31 AM(UTC)
Groups: Registered
Posts: 4
Hi,
here is a test tif and zones files I use.
Only one zone - ZoneType=Table
Thanks in advance
#4
Posted
:
Wednesday, February 24, 2010 6:14:57 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
The image you sent me does not have a constructed table in it. Our engines will recognize these numbers as numeric values and not as part of a table because there are no grid-lines.
If your table lines are not drawn on the image, you can find if words are aligned by comparing the locations of end of word characters using the OcrCharacter.Bounds member. If they are almost equal in the horizontal direction but different in the vertical direction, they will be aligned below each other.
I modified your image by drawing a table on it then performed OCR on it. I am attaching the image and the resulting word document I got.
#5
Posted
:
Wednesday, February 24, 2010 7:33:57 AM(UTC)
Groups: Registered
Posts: 4
Thanks for the info,
I see that is require to have lines in order to retrieve data from table.
In most cases I have to deal with images that are result of template removal (dropped by scanner on color base). And as with some other OCR engines I use, I have to go back to manual reconstruction of the table based coordinates.
Thanks for your time.
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.