This topic and its replies were posted before the current version of LEADTOOLS was released and may no longer be applicable.
#1
Posted
:
Monday, April 14, 2008 11:22:12 AM(UTC)
Groups: Registered
Posts: 32
Hi Adnan,
We have a table on a page and table's column header is spread into three lines. For example, "Last Action Date" is the column header and due to lack of room, the column header is wrapped (in the same cell) into 3 lines as below:
Last
Action
Date
How can we make sure that these three words are returned in the same order when I perform ocr on the page? We are not getting these three words in same order. We are getting the text from other column headers in between ..
Thanks
#2
Posted
:
Monday, April 14, 2008 11:25:12 AM(UTC)
Groups: Registered
Posts: 32
Please check attachment.
bsuresh attached the following image(s):
#3
Posted
:
Tuesday, April 15, 2008 3:25:37 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
You can define a Zone on each column header, and perform OCR on that Zone. The details to do this depends on what version of LEADTOOLS are you using and which LEADTOOLS programming interfaces you are using (API, COM Objects, or .NET Class library) to develop your application.
#4
Posted
:
Tuesday, April 15, 2008 6:59:40 AM(UTC)
Groups: Registered
Posts: 32
I am using 15 SDK with C#.
Unfortunately I cannot know whether a document will contain table or other data. All I want to acheive is to be able to get the list of words on the page in the correct order. This is the requirement. So for some pages that contain the table, the order of words is not corred (as in the above case, though the text belongs to the same column, they are not being reported together because the text is wrapped within the column header). How can we get this.
#5
Posted
:
Thursday, April 17, 2008 2:45:26 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
I
tested on your image, and did not define my own zones. Instead, I used the
default zones that the engine finds for itself. The result was that the 3 words
"Last Action Date" were automatically grouped into one zone, and when
I displayed the list of recognized words, they were listed in this exact order.
Can you post or send us the actual full image you're
trying to OCR instead of the partial screen capture.
#6
Posted
:
Thursday, April 17, 2008 8:29:27 AM(UTC)
Groups: Registered
Posts: 32
Qasem,
Thanks for the response. Here I am attaching the original page file (TIF).
Thanks,
Suresh
#7
Posted
:
Saturday, April 19, 2008 10:28:35 AM(UTC)
Groups: Registered
Posts: 32
Hi Qasem,
Any update on this please.
#8
Posted
:
Sunday, April 20, 2008 6:16:46 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
I have tested here
with the full file and I got the same results you did. You can try to achieve
this by checking the recognized words X coordinates and comparing them with
each other, if they are close then the worlds belong to the same column and you
can arrange them accordingly.
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.