This topic and its replies were posted before the current version of LEADTOOLS was released and may no longer be applicable.
#1
Posted
:
Friday, February 16, 2007 8:35:00 AM(UTC)
Groups: Registered
Posts: 23
I'm looking to highlight the words in a .tif fax that have a bad word confidence level with a yellow box and save the new .tif. This will allow me to tag a fax to be reviewed and then draw the reviewers attention to the place on the fax with bad ocr results to manually interpret. Does anyone know how this can be done or have a sample of how it would be accomplished.
#2
Posted
:
Wednesday, February 21, 2007 8:08:11 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
You can highlight the bad recognized words using one of the
following approaches, but first, you need to use the RasterOcrRecognizedCharacters
class to get the information about the recognized characters, such as the
(Rectangle using the "RasterOcrRecognizedCharacters.Rectangle Property")
of the recognize characters.
Approach-1:
Use the Annotations AnnHiliteObject to draw colored
rectangle on the bad recognized characters after getting their coordinates.
Please note that the annotation objects will only be
viewable in a viewing application that supports annotations, such as
"Microsoft Office Document Imaging" software
Approach- :
Convert your TIFF Fax images into colored images and
draw a colored rectangle on the bad recognized characters depending on the
coordinates returned by the "RasterOcrRecognizedCharacters.Rectangle
Property".
#3
Posted
:
Wednesday, February 21, 2007 9:22:41 AM(UTC)
Groups: Registered
Posts: 23
Here is my code so far. I understand finding the confidence of character 1st to get word but I can't seem to find out why the hilighting of characters isn't working.
annContainerObj.Bounds = new AnnRectangle(0, 0, rasterImageViewer1.Image.Width, rasterImageViewer1.Image.Height, AnnUnit.Pixel);
annContainerObj.Name = "Container";
annContainerObj.Visible = true;
annContainerObj.UnitConverter = new AnnUnitConverter(96, 96);
IList recogChars = rasterDocument.GetRecognizedCharacters(0);
int charsCount = recogChars.Count;
for (int i = 0; i < charsCount; i++)
{
if (recogChars[i].Confidence != 0)
{
AnnHiliteObject hilite = new AnnHiliteObject();
hilite.Bounds = new AnnRectangle(recogChars[i].Rectangle.X, recogChars[i].Rectangle.Y, recogChars[i].Rectangle.Width, recogChars[i].Rectangle.Height, AnnUnit.Pixel);
hilite.HiliteColor = Color.Yellow;
annContainerObj.Objects.Add(hilite);
}
}
rasterImageViewer1.Refresh();
#4
Posted
:
Thursday, February 22, 2007 12:32:21 PM(UTC)
Groups: Registered
Posts: 23
Okay, I have the code below that works okay with the characters but I am having issues trying to do this for recognized words and relating the character confience to the words. How would this be done.. Would be nice to have confidence flag on word...... Here is the code so far if anyone else is interested.
//Turn Image in to Color
for (int i = 1; i recogChars = rasterDocument.GetRecognizedCharacters(0);
int charsCount = recogChars.Count;
for (int i = 0; i < charsCount; i++)
{
if (recogChars[i].Confidence != 0)
{
AnnHiliteObject hilite = new AnnHiliteObject();
hilite.Bounds = new AnnRectangle(recogChars[i].Rectangle.X, recogChars[i].Rectangle.Y, recogChars[i].Rectangle.Width, recogChars[i].Rectangle.Height, AnnUnit.Pixel);
hilite.HiliteColor = Color.Yellow;
annContainerObj.Objects.Add(hilite);
}
}
rasterImageViewer1.Refresh();
#5
Posted
:
Sunday, February 25, 2007 6:38:24 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
One way to let the engine mark the suspect
results for you is to use the RasterOcrMarkOptions Class. You can find code
sample in the .NET help topic "RasterOcrMarkOptions".
#6
Posted
:
Monday, February 26, 2007 8:43:43 AM(UTC)
Groups: Registered
Posts: 23
I have used the RasterOCRMarkOptions, but I'm trying to look through the words and write the works w/ guess's and confidence level to a database for an application to further review them. I would rather not output the rtf and then read fonts etc to do this. Basically, I would like to loop through words and tag each word that doesn't have 100% accuracy in its characters and then take the sum of the accuracy and divide that by the # of characters to get avg accuracy. Example: The word test is OCR'ed. The 't' is 100% the 'e' is 50% the 's' is 100% and 't' is 100%. I would take 350/400 to get 87.5% for the word. Any ideas on how? Will Leadtools ever just have an accuracy value for a word? Looks like you already have to do it for the RasterOCRMarkOptions....
#7
Posted
:
Thursday, March 1, 2007 10:40:15 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
You can get all characters and the confidence of each
character (number from 0 to 100, where 0 is highest confidence) using
GetRecognizedCharacters.
This also gives you the area of each character in the RasterOcrRecognizedCharacters[index].Rectangle property.
Similarly GetRecognizedWords gives all words and their
locations using the RasterOcrRecognizedWords.WordArea property.
What you can do is compare the locations of all
character rectangles with the words rectangles to know which work contains
which characters, and calculate your own confidence value accordingly.
#8
Posted
:
Friday, March 2, 2007 7:44:59 AM(UTC)
Groups: Registered
Posts: 23
Does anyone have an example of the comparrison between char confidence and the word position in C#? I think that this would be very helpful as part of the demo.
#9
Posted
:
Tuesday, March 6, 2007 2:47:28 PM(UTC)
Groups: Registered
Posts: 23
Please see code attached. I have the C# code in the method attached to this post that will try to find bad char's and then find area of word to highlight the bad word. It is working w/ relation to the char confidence but it looks like the 'suspect word' in RasterDocumentMarkOptions is using something else since the words marked in the RasterDocumentMarkOptions word doc that I save are different from the words highlighted via the char confidence method. Is the RasterDocumentMarkOptions using a dictionary lookup to override the char confidence? If so how could I plug the dictionary lookup in to this function to do that same check. Once again, it would have been nice to have this accessable via code to check word confidence levels.
Dave
#10
Posted
:
Sunday, March 11, 2007 6:31:09 AM(UTC)
Groups: Guests
Posts: 3,022
Was thanked: 2 time(s) in 2 post(s)
Dave,
The code snippet you attached only compares the confidence
value to zero. However, this value carries more information than that. To check
the word correctness, bitwise-AND this value with 0xFFFFFF80 and see if the
resulting value is zero or not.
The help topic "Confidence Reporting" explains
this in detail.
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.