Welcome Guest! To enable all features, please Login or Register.

Notification

Icon
Error

Options
View
Last Go to last post Unread Go to first unread post
#1 Posted : Friday, February 16, 2007 8:35:00 AM(UTC)
derevell

Groups: Registered
Posts: 23


I'm looking to highlight the words in a .tif fax that have a bad word confidence level with a yellow box and save the new .tif. This will allow me to tag a fax to be reviewed and then draw the reviewers attention to the place on the fax with bad ocr results to manually interpret. Does anyone know how this can be done or have a sample of how it would be accomplished.
 

Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads

Wanna join the discussion? Login to your LEADTOOLS Support accountor Register a new forum account.

#2 Posted : Wednesday, February 21, 2007 8:08:11 AM(UTC)

Adnan Ismail  
Guest

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)




You can highlight the bad recognized words using one of the
following approaches, but first, you need to use the RasterOcrRecognizedCharacters
class to get the information about the recognized characters, such as the
(Rectangle using the "RasterOcrRecognizedCharacters.Rectangle Property") of the recognize characters.










Approach-1:
Use the Annotations AnnHiliteObject to draw colored rectangle on the bad recognized characters after getting their coordinates.
Please note that the annotation objects will only be
viewable in a viewing application that supports annotations, such as "Microsoft Office Document Imaging" software


Approach- :

Convert your TIFF Fax images into colored images and
draw a colored rectangle on the bad recognized characters depending on the
coordinates returned by the "RasterOcrRecognizedCharacters.Rectangle Property".

 
#3 Posted : Wednesday, February 21, 2007 9:22:41 AM(UTC)
derevell

Groups: Registered
Posts: 23


Here is my code so far. I understand finding the confidence of character 1st to get word but I can't seem to find out why the hilighting of characters isn't working.

annContainerObj.Bounds = new AnnRectangle(0, 0, rasterImageViewer1.Image.Width, rasterImageViewer1.Image.Height, AnnUnit.Pixel);
annContainerObj.Name = "Container";
annContainerObj.Visible = true;
annContainerObj.UnitConverter = new AnnUnitConverter(96, 96);

IList recogChars = rasterDocument.GetRecognizedCharacters(0);
int charsCount = recogChars.Count;
for (int i = 0; i < charsCount; i++)
{
if (recogChars[i].Confidence != 0)
{
AnnHiliteObject hilite = new AnnHiliteObject();
hilite.Bounds = new AnnRectangle(recogChars[i].Rectangle.X, recogChars[i].Rectangle.Y, recogChars[i].Rectangle.Width, recogChars[i].Rectangle.Height, AnnUnit.Pixel);
hilite.HiliteColor = Color.Yellow;
annContainerObj.Objects.Add(hilite);
}
}


rasterImageViewer1.Refresh();
 
#4 Posted : Thursday, February 22, 2007 12:32:21 PM(UTC)
derevell

Groups: Registered
Posts: 23


Okay, I have the code below that works okay with the characters but I am having issues trying to do this for recognized words and relating the character confience to the words. How would this be done.. Would be nice to have confidence flag on word...... Here is the code so far if anyone else is interested.

//Turn Image in to Color
for (int i = 1; i recogChars = rasterDocument.GetRecognizedCharacters(0);
int charsCount = recogChars.Count;
for (int i = 0; i < charsCount; i++)
{
if (recogChars[i].Confidence != 0)
{
AnnHiliteObject hilite = new AnnHiliteObject();
hilite.Bounds = new AnnRectangle(recogChars[i].Rectangle.X, recogChars[i].Rectangle.Y, recogChars[i].Rectangle.Width, recogChars[i].Rectangle.Height, AnnUnit.Pixel);
hilite.HiliteColor = Color.Yellow;
annContainerObj.Objects.Add(hilite);
}
}

rasterImageViewer1.Refresh();
File Attachment(s):
hilightOCR.txt (3kb) downloaded 30 time(s).
 
#5 Posted : Sunday, February 25, 2007 6:38:24 AM(UTC)

Adnan Ismail  
Guest

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

One way to let the engine mark the suspect
results for you is to use the RasterOcrMarkOptions Class. You can find code sample in the .NET help topic "RasterOcrMarkOptions".
 
#6 Posted : Monday, February 26, 2007 8:43:43 AM(UTC)
derevell

Groups: Registered
Posts: 23


I have used the RasterOCRMarkOptions, but I'm trying to look through the words and write the works w/ guess's and confidence level to a database for an application to further review them. I would rather not output the rtf and then read fonts etc to do this. Basically, I would like to loop through words and tag each word that doesn't have 100% accuracy in its characters and then take the sum of the accuracy and divide that by the # of characters to get avg accuracy. Example: The word test is OCR'ed. The 't' is 100% the 'e' is 50% the 's' is 100% and 't' is 100%. I would take 350/400 to get 87.5% for the word. Any ideas on how? Will Leadtools ever just have an accuracy value for a word? Looks like you already have to do it for the RasterOCRMarkOptions....
 
#7 Posted : Thursday, March 1, 2007 10:40:15 AM(UTC)

Adnan Ismail  
Guest

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)




You can get all characters and the confidence of each
character (number from 0 to 100, where 0 is highest confidence) using GetRecognizedCharacters.




This also gives you the area of each character in the RasterOcrRecognizedCharacters[index].Rectangle property.


Similarly GetRecognizedWords gives all words and their locations using the RasterOcrRecognizedWords.WordArea property.

What you can do is compare the locations of all
character rectangles with the words rectangles to know which work contains which characters, and calculate your own confidence value accordingly.

 
#8 Posted : Friday, March 2, 2007 7:44:59 AM(UTC)
derevell

Groups: Registered
Posts: 23


Does anyone have an example of the comparrison between char confidence and the word position in C#? I think that this would be very helpful as part of the demo.
 
#9 Posted : Tuesday, March 6, 2007 2:47:28 PM(UTC)
derevell

Groups: Registered
Posts: 23


Please see code attached. I have the C# code in the method attached to this post that will try to find bad char's and then find area of word to highlight the bad word. It is working w/ relation to the char confidence but it looks like the 'suspect word' in RasterDocumentMarkOptions is using something else since the words marked in the RasterDocumentMarkOptions word doc that I save are different from the words highlighted via the char confidence method. Is the RasterDocumentMarkOptions using a dictionary lookup to override the char confidence? If so how could I plug the dictionary lookup in to this function to do that same check. Once again, it would have been nice to have this accessable via code to check word confidence levels.
Dave
File Attachment(s):
annotationBadWords.txt (4kb) downloaded 32 time(s).
 
#10 Posted : Sunday, March 11, 2007 6:31:09 AM(UTC)

Adnan Ismail  
Guest

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)


Dave,


The code snippet you attached only compares the confidence
value to zero. However, this value carries more information than that. To check
the word correctness, bitwise-AND this value with 0xFFFFFF80 and see if the resulting value is zero or not.


The help topic "Confidence Reporting" explains this in detail.

 
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Powered by YAF.NET | YAF.NET © 2003-2024, Yet Another Forum.NET
This page was generated in 0.224 seconds.