GetRecognizedCharacters Method

Summary

Gets the last recognized character data of this IOcrPage

Syntax

Objective-C

C++

Java

public IOcrPageCharacters GetRecognizedCharacters()

Function GetRecognizedCharacters() As IOcrPageCharacters

- (nullable LTOcrPageCharacters *)recognizedCharacters:(NSError **)error

public OcrPageCharacters getRecognizedCharacters()

IOcrPageCharacters^ GetRecognizedCharacters();

Return Value

An instance of IOcrPageCharacters containing the last recognized characters data of this IOcrPage.

Remarks

You must call this method after the IOcrPage has been recognized with the Recognize method. i.e., if the value of the IsRecognized property of this page is false, then calling this method will throw an exception.

You can use the GetRecognizedCharacters to examine the recognized character data. This data contain information about the character codes, their confidence, guess codes, location and position in the page as well as font information. For more information, refer to OcrCharacter.

The GetRecognizedCharacters method returns an instance of IOcrPageCharacters, this instance is a collection of IOcrZoneCharacters. The IOcrZoneCharacters.ZoneIndex property contains the zero-based index of the zone. You can get the zone information by using the same index as the Zones property of this IOcrPage.

If you wish to modify and the apply recognition data back to the page, Use SetRecognizedCharacters.

Use IOcrZoneCharacters.GetWords to get the recognized words of a zone.

Notes on spaces: The LEADTOOLS OCR Module - LEAD Engine will not return any space characters when using the GetRecognizedCharacters method.

The LEADTOOLS OCR Module - OmniPage Engine will not return space characters if the value of the boolean Recognition.SpaceIsValidCharacter setting value is false (the default). If you absolutely require space characters in the recognition results when using the LEADTOOLS OmniPage Engine, then set the value of the boolean Recognition.SpaceIsValidCharacter setting to true ( ocrEngineInstance.SettingManager.SetBooleanValue("Recognition.SpaceIsValidCharacter", true)). For more information on OCR settings, refer to IOcrSettingManager and LEADTOOLS OCR Module - OmniPage Engine Settings.

The SetRecognizedCharacters method will accept space characters in the LEADTOOLS LEAD engine. However, these space characters will be used when generating the final document (PDF) and might affect the final output. Therefore, it is not recommended that you insert space characters when using the LEADTOOLS LEAD engine.

The LEADTOOLS OCR Module - OmniPage Engine will strip any space characters from the results passed to SetRecognizedCharacters if the value of the boolean Recognition.SpaceIsValidCharacter setting value is false (the default). If you absolutely require space characters in the recognition results when using the LEADTOOLS OmniPage Engine, then set the value of the boolean Recognition.SpaceIsValidCharacter setting to true before calling SetRecognizedCharacters.

If you use the GetRecognizedCharacters and SetRecognizedCharacters methods to modify the recognition result prior to saving to an output file, and you are planning on using the engine native save capability (through setting the IOcrDocumentManager.EngineFormat property and using DocumentFormat.User in the IOcrDocument.Save method), then you must change the boolean Recognition.SpaceIsValidCharacter setting to true.

The IOcrPageCharacters interface also contains the IOcrPageCharacters.UpdateWord method that allow to modify the OCR recognition results by updating or deleting the words before optionally saving the results to the final output document.

Example

This example will get the recognized characters of a page, modify them and set them back before saving the final document.

using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Ocr; 
using Leadtools.Forms.Common; 
using Leadtools.Document.Writer; 
using Leadtools.WinForms; 
using Leadtools.Drawing; 
using Leadtools.ImageProcessing; 
using Leadtools.ImageProcessing.Color; 
 
public void RecognizedCharactersExample() 
{ 
   // Create an image with some text in it 
   RasterImage image = new RasterImage(RasterMemoryFlags.Conventional, 640, 200, 24, RasterByteOrder.Bgr, RasterViewPerspective.TopLeft, null, IntPtr.Zero, 0); 
   Rectangle imageRect = new Rectangle(0, 0, image.ImageWidth, image.ImageHeight); 
 
   IntPtr hdc = RasterImagePainter.CreateLeadDC(image); 
   using (Graphics g = Graphics.FromHdc(hdc)) 
   { 
      g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQuality; 
      g.FillRectangle(Brushes.White, imageRect); 
 
      using (Font f = new Font("Arial", 20, FontStyle.Regular)) 
         g.DrawString("Normal line", f, Brushes.Black, 0, 0); 
 
      using (Font f = new Font("Arial", 20, FontStyle.Bold)) 
         g.DrawString("Bold, italic and underline", f, Brushes.Black, 0, 40); 
 
      using (Font f = new Font("Courier New", 20, FontStyle.Regular)) 
         g.DrawString("Monospaced line", f, Brushes.Black, 0, 80); 
   } 
 
   RasterImagePainter.DeleteLeadDC(hdc); 
 
   string textFileName = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.txt"); 
   string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.pdf"); 
 
   // Create an instance of the engine 
   using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD)) 
   { 
      // Start the engine using default parameters 
      ocrEngine.Startup(null, null, null, LEAD_VARS.OcrLEADRuntimeDir); 
 
      // Create an OCR page 
      IOcrPage ocrPage = ocrEngine.CreatePage(image, OcrImageSharingMode.AutoDispose); 
 
      // Recognize this page 
      ocrPage.Recognize(null); 
 
      // Dump the characters into a text file 
      using (StreamWriter writer = File.CreateText(textFileName)) 
      { 
         IOcrPageCharacters ocrPageCharacters = ocrPage.GetRecognizedCharacters(); 
         foreach (IOcrZoneCharacters ocrZoneCharacters in ocrPageCharacters) 
         { 
            // Show the words found in this zone. Get the word boundaries in inches 
            ICollection<OcrWord> words = ocrZoneCharacters.GetWords(); 
            Console.WriteLine("Words:"); 
            foreach (OcrWord word in words) 
               Console.WriteLine("Word: {0}, at {1}, characters index from {2} to {3}", word.Value, word.Bounds, word.FirstCharacterIndex, word.LastCharacterIndex); 
 
            bool nextCharacterIsNewWord = true; 
 
            for (int i = 0; i < ocrZoneCharacters.Count; i++) 
            { 
               OcrCharacter ocrCharacter = ocrZoneCharacters[i]; 
 
               // Capitalize the first letter if this is a new word 
               if (nextCharacterIsNewWord) 
                  ocrCharacter.Code = Char.ToUpper(ocrCharacter.Code); 
 
               writer.WriteLine("Code: {0}, Confidence: {1}, WordIsCertain: {2}, Bounds: {3}, Position: {4}, FontSize: {5}, FontStyle: {6}", 
                  ocrCharacter.Code, 
                  ocrCharacter.Confidence, 
                  ocrCharacter.WordIsCertain, 
                  ocrCharacter.Bounds, 
                  ocrCharacter.Position, 
                  ocrCharacter.FontSize, 
                  ocrCharacter.FontStyle); 
 
               // If the character is bold, make it underline 
               if ((ocrCharacter.FontStyle & OcrCharacterFontStyle.Bold) == OcrCharacterFontStyle.Bold) 
               { 
                  ocrCharacter.FontStyle |= OcrCharacterFontStyle.Italic; 
                  ocrCharacter.FontStyle |= OcrCharacterFontStyle.Underline; 
               } 
 
               // Check if next character is the start of a new word 
               if ((ocrCharacter.Position & OcrCharacterPosition.EndOfWord) == OcrCharacterPosition.EndOfWord || 
                  (ocrCharacter.Position & OcrCharacterPosition.EndOfLine) == OcrCharacterPosition.EndOfLine) 
                  nextCharacterIsNewWord = true; 
               else 
                  nextCharacterIsNewWord = false; 
 
               ocrZoneCharacters[i] = ocrCharacter; 
            } 
         } 
 
         // Replace the characters with the modified one before we save 
         ocrPage.SetRecognizedCharacters(ocrPageCharacters); 
      } 
 
      // Create an OCR document so we can save the results 
      using (IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(null, OcrCreateDocumentOptions.AutoDeleteFile)) 
      { 
         // Add the page and dispose it 
         ocrDocument.Pages.Add(ocrPage); 
         ocrPage.Dispose(); 
 
         // Show the recognition results 
         // Set the PDF options to save as PDF/A text only 
         PdfDocumentOptions pdfOptions = ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf) as PdfDocumentOptions; 
         pdfOptions.DocumentType = PdfDocumentType.PdfA; 
         pdfOptions.ImageOverText = false; 
         ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions); 
 
         ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, null); 
 
         // Open and check the result file, it should contain the following text 
         // "Normal Line" 
         // "Bold And Italic Line" 
         // "Monospaced Line" 
         // With the second line bold and underlined now 
      } 
 
      // Shutdown the engine 
      // Note: calling Dispose will also automatically shutdown the engine if it has been started 
      ocrEngine.Shutdown(); 
   } 
} 
 
static class LEAD_VARS 
{ 
   public const string ImagesDir = @"C:\LEADTOOLS21\Resources\Images"; 
   public const string OcrLEADRuntimeDir = @"C:\LEADTOOLS21\Bin\Common\OcrLEADRuntime"; 
}

Imports Leadtools 
Imports Leadtools.Codecs 
Imports Leadtools.Ocr 
Imports Leadtools.Forms 
Imports Leadtools.Document.Writer 
Imports Leadtools.WinForms 
Imports Leadtools.Drawing 
Imports Leadtools.ImageProcessing 
Imports Leadtools.ImageProcessing.Color 
 
Public Sub RecognizedCharactersExample() 
   ' Create an image with some text in it 
   Dim image As New RasterImage(RasterMemoryFlags.Conventional, 
                                 640, 200, 24, 
                                 RasterByteOrder.Bgr, 
                                 RasterViewPerspective.TopLeft, 
                                 Nothing, IntPtr.Zero, 0) 
   Dim imageRect As New Rectangle(0, 0, image.ImageWidth, image.ImageHeight) 
 
   Dim hdc As IntPtr = RasterImagePainter.CreateLeadDC(image) 
   Using g As Graphics = Graphics.FromHdc(hdc) 
      g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQuality 
      g.FillRectangle(Brushes.White, imageRect) 
 
      Using f As New Font("Arial", 20, FontStyle.Regular) 
         g.DrawString("Normal line", f, Brushes.Black, 0, 0) 
      End Using 
 
      Using f As New Font("Arial", 20, FontStyle.Bold) 
         g.DrawString("Bold, italic and underline", f, Brushes.Black, 0, 40) 
      End Using 
 
      Using f As New Font("Courier New", 20, FontStyle.Regular) 
         g.DrawString("Monospaced line", f, Brushes.Black, 0, 80) 
      End Using 
   End Using 
 
   RasterImagePainter.DeleteLeadDC(hdc) 
 
   Dim textFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.txt") 
   Dim pdfFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.pdf") 
 
   ' Create an instance of the engine 
   Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD) 
      ' Start the engine using default parameters 
      ocrEngine.Startup(Nothing, Nothing, Nothing, LEAD_VARS.OcrLEADRuntimeDir) 
 
      ' Create an OCR page 
      Dim ocrPage As IOcrPage = ocrEngine.CreatePage(image, OcrImageSharingMode.AutoDispose) 
 
      ' Recognize this page 
      ocrPage.Recognize(Nothing) 
 
      ' Dump the characters into a text file 
      Using writer As StreamWriter = File.CreateText(textFileName) 
         Dim ocrPageCharacters As IOcrPageCharacters = ocrPage.GetRecognizedCharacters() 
         For Each ocrZoneCharacters As IOcrZoneCharacters In ocrPageCharacters 
            Dim words As ICollection(Of OcrWord) = ocrZoneCharacters.GetWords() 
            Console.WriteLine("Words:") 
            For Each word As OcrWord In words 
               Console.WriteLine("Word: {0}, at {1}, characters index from {2} to {3}", 
                                 word.Value, word.Bounds, word.FirstCharacterIndex, word.LastCharacterIndex) 
            Next 
 
            Dim nextCharacterIsNewWord As Boolean = True 
 
            For i As Integer = 0 To ocrZoneCharacters.Count - 1 
               Dim ocrCharacter As OcrCharacter = ocrZoneCharacters(i) 
 
               ' Capitalize the first letter if this is a new word 
               If nextCharacterIsNewWord Then 
                  ocrCharacter.Code = [Char].ToUpper(ocrCharacter.Code) 
               End If 
 
               writer.WriteLine("Code: {0}, Confidence: {1}, WordIsCertain: {2}, Bounds: {3}, Position: {4}, FontSize: {5}, FontStyle: {6}", 
                                 ocrCharacter.Code, 
                                 ocrCharacter.Confidence, 
                                 ocrCharacter.WordIsCertain, 
                                 ocrCharacter.Bounds, 
                                 ocrCharacter.Position, 
                                 ocrCharacter.FontSize, 
                                 ocrCharacter.FontStyle) 
 
               ' If the charcater is bold, make it underline 
               If (ocrCharacter.FontStyle And OcrCharacterFontStyle.Bold) = OcrCharacterFontStyle.Bold Then 
                  ocrCharacter.FontStyle = ocrCharacter.FontStyle Or OcrCharacterFontStyle.Italic 
                  ocrCharacter.FontStyle = ocrCharacter.FontStyle Or OcrCharacterFontStyle.Underline 
               End If 
 
               ' Check if next character is the start of a new word 
               If (ocrCharacter.Position And OcrCharacterPosition.EndOfWord) = OcrCharacterPosition.EndOfWord OrElse 
                  (ocrCharacter.Position And OcrCharacterPosition.EndOfLine) = OcrCharacterPosition.EndOfLine Then 
                  nextCharacterIsNewWord = True 
               Else 
                  nextCharacterIsNewWord = False 
               End If 
 
               ocrZoneCharacters(i) = ocrCharacter 
            Next 
         Next 
 
         ' Replace the characters with the modified one before we save 
         ocrPage.SetRecognizedCharacters(ocrPageCharacters) 
      End Using 
 
      ' Create an OCR document so we can save the results 
      Using ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument(Nothing, OcrCreateDocumentOptions.AutoDeleteFile) 
         ' Add the page and dispose it 
         ocrDocument.Pages.Add(ocrPage) 
         ocrPage.Dispose() 
 
         ' Show the recognition results 
         ' Set the PDF options to save as PDF/A text only 
         Dim pdfOptions As PdfDocumentOptions = TryCast(ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf), PdfDocumentOptions) 
         pdfOptions.DocumentType = PdfDocumentType.PdfA 
         pdfOptions.ImageOverText = False 
         ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions) 
 
 
         ' Open and check the result file, it should contain the following text 
         ' "Normal Line" 
         ' "Bold And Italic Line" 
         ' "Monospaced Line" 
         ' With the second line bold and underlined now 
         ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, Nothing) 
      End Using 
 
      ' Shutdown the engine 
      ' Note: calling Dispose will also automatically shutdown the engine if it has been started 
      ocrEngine.Shutdown() 
   End Using 
End Sub 
 
Public NotInheritable Class LEAD_VARS 
   Public Const ImagesDir As String = "C:\LEADTOOLS21\Resources\Images" 
   Public Const OcrLEADRuntimeDir As String = "C:\LEADTOOLS21\Bin\Common\OcrLEADRuntime" 
End Class