GetRecognizedCharacters Method

Gets the last recognized character data of this IOcrPage

Syntax

C#
Visual Basic
WinRT C#
Java
Objective-C
WinRT JavaScript
C++

IOcrPageCharacters GetRecognizedCharacters()

'Declaration

Function GetRecognizedCharacters() As IOcrPageCharacters

'Usage

Dim instance As IOcrPage
Dim value As IOcrPageCharacters
 
value = instance.GetRecognizedCharacters()

IOcrPageCharacters GetRecognizedCharacters()

- (nullable LTOcrPageCharacters *)recognizedCharacters:(NSError **)error

public OcrPageCharacters getRecognizedCharacters()

function Leadtools.Forms.Ocr.IOcrPage.GetRecognizedCharacters()

IOcrPageCharacters^ GetRecognizedCharacters();

Return Value

An instance of IOcrPageCharacters containing the last recognized characters data of this IOcrPage.

Remarks

You must call this method after the IOcrPage has been recognized with the Recognize method. i.e., if the value of the IsRecognized property of this page is false, then calling this method will throw an exception.

You can use the GetRecognizedCharacters to examine the recognized character data. This data contain information about the character codes, their confidence, guess codes, location and position in the page as well as font information. For more information, refer to OcrCharacter.

The GetRecognizedCharacters method returns an instance of IOcrPageCharacters, this instance is a collection of IOcrZoneCharacters. The IOcrZoneCharacters.ZoneIndex property contains the zero-based index of the zone. You can get the zone information by using the same index as the Zones property of this IOcrPage.

If you wish to modify and the apply recognition data back to the page, Use SetRecognizedCharacters.

Use IOcrZoneCharacters.GetWords to get the recognized words of a zone.

Notes on spaces: The LEADTOOLS Advantage OCR engine will not return any space characters when using the GetRecognizedCharacters method.

The LEADTOOLS Professional OCR engine will not return space characters if the value of the boolean Recognition.SpaceIsValidCharacter setting value is false (the default). If you absolutely require space characters in the recognition results when using the LEADTOOLS Professional Engine, then set the value of the boolean Recognition.SpaceIsValidCharacter setting to true ( ocrEngineInstance.SettingManager.SetBooleanValue("Recognition.SpaceIsValidCharacter", true)). For more information on OCR settings, refer to IOcrSettingManager and LEADTOOLS OCR Professional Engine Settings.

The SetRecognizedCharacters method will accept space characters in the LEADTOOLS Advantage engine. However, these space characters will be used when generating the final document (PDF) and might affect the final output. Therefore, it is not recommended that you insert space characters when using the LEADTOOLS Advantage engine.

The LEADTOOLS Professional OCR engine will strip any space characters from the results passed to SetRecognizedCharacters if the value of the boolean Recognition.SpaceIsValidCharacter setting value is false (the default). If you absolutely require space characters in the recognition results when using the LEADTOOLS Professional Engine, then set the value of the boolean Recognition.SpaceIsValidCharacter setting to true before calling SetRecognizedCharacters.

If you use the GetRecognizedCharacters and SetRecognizedCharacters methods to modify the recognition result prior to saving to an output file, and you are planning on using the engine native save capability (through setting the IOcrDocumentManager.EngineFormat property and using DocumentFormat.User in the IOcrDocument.Save method), then you must change the boolean Recognition.SpaceIsValidCharacter setting to true.

The IOcrPageCharacters interface also contains the IOcrPageCharacters.UpdateWord method that allow to modify the OCR recognition results by updating or deleting the words before optionally saving the results to the final output document.

Example

This example will get the recognized characters of a page, modify them and set them back before saving the final document.

C#
Visual Basic
WinRT C#
WinRT JavaScript
Silverlight C#
Silverlight VB

Copy Code

Imports Leadtools
Imports Leadtools.Codecs
Imports Leadtools.Forms.Ocr
Imports Leadtools.Forms
Imports Leadtools.Forms.DocumentWriters
Imports Leadtools.WinForms
Imports Leadtools.Drawing
Imports Leadtools.ImageProcessing
Imports Leadtools.ImageProcessing.Color

<TestMethod>
Public Sub RecognizedCharactersExample()
   ' Create an image with some text in it
   Dim image As New RasterImage(RasterMemoryFlags.Conventional, _
                                 640, 200, 24, _
                                 RasterByteOrder.Bgr, _
                                 RasterViewPerspective.TopLeft, _
                                 Nothing, IntPtr.Zero, 0)
   Dim imageRect As New Rectangle(0, 0, image.ImageWidth, image.ImageHeight)
   Dim hdc As IntPtr = RasterImagePainter.CreateLeadDC(image)
   Using g As Graphics = Graphics.FromHdc(hdc)
      g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQuality
      g.FillRectangle(Brushes.White, imageRect)

      Using f As New Font("Arial", 20, FontStyle.Regular)
         g.DrawString("Normal line", f, Brushes.Black, 0, 0)
      End Using

      Using f As New Font("Arial", 20, FontStyle.Bold)
         g.DrawString("Bold, italic and underline", f, Brushes.Black, 0, 40)
      End Using

      Using f As New Font("Courier New", 20, FontStyle.Regular)
         g.DrawString("Monospaced line", f, Brushes.Black, 0, 80)
      End Using
   End Using

   RasterImagePainter.DeleteLeadDC(hdc)

   Dim textFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.txt")
   Dim pdfFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.pdf")

   ' Create an instance of the engine
   Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False)
      ' Start the engine using default parameters
      ocrEngine.Startup(Nothing, Nothing, Nothing, LEAD_VARS.OcrAdvantageRuntimeDir)

      ' Create an OCR page
      Dim ocrPage As IOcrPage = ocrEngine.CreatePage(image, OcrImageSharingMode.AutoDispose)

      ' Recognize this page
      ocrPage.Recognize(Nothing)

      ' Dump the characters into a text file
      Using writer As StreamWriter = File.CreateText(textFileName)
         Dim ocrPageCharacters As IOcrPageCharacters = ocrPage.GetRecognizedCharacters()
         For Each ocrZoneCharacters As IOcrZoneCharacters In ocrPageCharacters
            ' Show the words found in this zone. Get the word boundaries in inches
            Dim words As ICollection(Of OcrWord) = ocrZoneCharacters.GetWords(ocrPage.DpiX, ocrPage.DpiY, LogicalUnit.Inch)
            Console.WriteLine("Words:")
            For Each word As OcrWord In words
               Console.WriteLine("Word: {0}, at {1}, characters index from {2} to {3}", _
                                 word.Value, word.Bounds, word.FirstCharacterIndex, word.LastCharacterIndex)
            Next

            Dim nextCharacterIsNewWord As Boolean = True

            For i As Integer = 0 To ocrZoneCharacters.Count - 1
               Dim ocrCharacter As OcrCharacter = ocrZoneCharacters(i)

               ' Capitalize the first letter if this is a new word
               If nextCharacterIsNewWord Then
                  ocrCharacter.Code = [Char].ToUpper(ocrCharacter.Code)
               End If

               writer.WriteLine("Code: {0}, Confidence: {1}, WordIsCertain: {2}, Bounds: {3}, Position: {4}, FontSize: {5}, FontStyle: {6}", _
                                 ocrCharacter.Code, _
                                 ocrCharacter.Confidence, _
                                 ocrCharacter.WordIsCertain, _
                                 ocrCharacter.Bounds, _
                                 ocrCharacter.Position, _
                                 ocrCharacter.FontSize, _
                                 ocrCharacter.FontStyle)

               ' If the charcater is bold, make it underline
               If (ocrCharacter.FontStyle And OcrCharacterFontStyle.Bold) = OcrCharacterFontStyle.Bold Then
                  ocrCharacter.FontStyle = ocrCharacter.FontStyle Or OcrCharacterFontStyle.Italic
                  ocrCharacter.FontStyle = ocrCharacter.FontStyle Or OcrCharacterFontStyle.Underline
               End If

               ' Check if next character is the start of a new word
               If (ocrCharacter.Position And OcrCharacterPosition.EndOfWord) = OcrCharacterPosition.EndOfWord OrElse _
                  (ocrCharacter.Position And OcrCharacterPosition.EndOfLine) = OcrCharacterPosition.EndOfLine Then
                  nextCharacterIsNewWord = True
               Else
                  nextCharacterIsNewWord = False
               End If

               ocrZoneCharacters(i) = ocrCharacter
            Next
         Next

         ' Replace the characters with the modified one before we save
         ocrPage.SetRecognizedCharacters(ocrPageCharacters)
      End Using

      ' Create an OCR document so we can save the results
      Using ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument(Nothing, OcrCreateDocumentOptions.AutoDeleteFile)
         ' Add the page and dispose it
         ocrDocument.Pages.Add(ocrPage)
         ocrPage.Dispose()

         ' Show the recognition results
         ' Set the PDF options to save as PDF/A text only
         Dim pdfOptions As PdfDocumentOptions = TryCast(ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf), PdfDocumentOptions)
         pdfOptions.DocumentType = PdfDocumentType.PdfA
         pdfOptions.ImageOverText = False
         ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions)


         ' Open and check the result file, it should contain the following text
         ' "Normal Line"
         ' "Bold And Italic Line"
         ' "Monospaced Line"
         ' With the second line bold and underlined now
         ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, Nothing)
      End Using

      ' Shutdown the engine
      ' Note: calling Dispose will also automatically shutdown the engine if it has been started
      ocrEngine.Shutdown()
   End Using
End Sub

Public NotInheritable Class LEAD_VARS
Public Const ImagesDir As String = "C:\Users\Public\Documents\LEADTOOLS Images"
Public Const OcrAdvantageRuntimeDir As String = "C:\LEADTOOLS 19\Bin\Common\OcrAdvantageRuntime"
End Class

using Leadtools;
using Leadtools.Codecs;
using Leadtools.Forms.Ocr;
using Leadtools.Forms;
using Leadtools.Forms.DocumentWriters;
using Leadtools.WinForms;
using Leadtools.Drawing;
using Leadtools.ImageProcessing;
using Leadtools.ImageProcessing.Color;

public void RecognizedCharactersExample()
{
   // Create an image with some text in it
   RasterImage image = new RasterImage(RasterMemoryFlags.Conventional, 640, 200, 24, RasterByteOrder.Bgr, RasterViewPerspective.TopLeft, null, IntPtr.Zero, 0);
   Rectangle imageRect = new Rectangle(0, 0, image.ImageWidth, image.ImageHeight);
   IntPtr hdc = RasterImagePainter.CreateLeadDC(image);
   using (Graphics g = Graphics.FromHdc(hdc))
   {
      g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQuality;
      g.FillRectangle(Brushes.White, imageRect);

      using (Font f = new Font("Arial", 20, FontStyle.Regular))
         g.DrawString("Normal line", f, Brushes.Black, 0, 0);

      using (Font f = new Font("Arial", 20, FontStyle.Bold))
         g.DrawString("Bold, italic and underline", f, Brushes.Black, 0, 40);

      using (Font f = new Font("Courier New", 20, FontStyle.Regular))
         g.DrawString("Monospaced line", f, Brushes.Black, 0, 80);
   }

   RasterImagePainter.DeleteLeadDC(hdc);

   string textFileName = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.txt");
   string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.pdf");

   // Create an instance of the engine
   using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false))
   {
      // Start the engine using default parameters
      ocrEngine.Startup(null, null, null, LEAD_VARS.OcrAdvantageRuntimeDir);

      // Create an OCR page
      IOcrPage ocrPage = ocrEngine.CreatePage(image, OcrImageSharingMode.AutoDispose);

      // Recognize this page
      ocrPage.Recognize(null);

      // Dump the characters into a text file
      using (StreamWriter writer = File.CreateText(textFileName))
      {
         IOcrPageCharacters ocrPageCharacters = ocrPage.GetRecognizedCharacters();
         foreach (IOcrZoneCharacters ocrZoneCharacters in ocrPageCharacters)
         {
            // Show the words found in this zone. Get the word boundaries in inches
            ICollection<OcrWord> words = ocrZoneCharacters.GetWords(ocrPage.DpiX, ocrPage.DpiY, LogicalUnit.Inch);
            Console.WriteLine("Words:");
            foreach (OcrWord word in words)
               Console.WriteLine("Word: {0}, at {1}, characters index from {2} to {3}", word.Value, word.Bounds, word.FirstCharacterIndex, word.LastCharacterIndex);

            bool nextCharacterIsNewWord = true;

            for (int i = 0; i < ocrZoneCharacters.Count; i++)
            {
               OcrCharacter ocrCharacter = ocrZoneCharacters[i];

               // Capitalize the first letter if this is a new word
               if (nextCharacterIsNewWord)
                  ocrCharacter.Code = Char.ToUpper(ocrCharacter.Code);

               writer.WriteLine("Code: {0}, Confidence: {1}, WordIsCertain: {2}, Bounds: {3}, Position: {4}, FontSize: {5}, FontStyle: {6}",
                  ocrCharacter.Code,
                  ocrCharacter.Confidence,
                  ocrCharacter.WordIsCertain,
                  ocrCharacter.Bounds,
                  ocrCharacter.Position,
                  ocrCharacter.FontSize,
                  ocrCharacter.FontStyle);

               // If the charcater is bold, make it underline
               if ((ocrCharacter.FontStyle & OcrCharacterFontStyle.Bold) == OcrCharacterFontStyle.Bold)
               {
                  ocrCharacter.FontStyle |= OcrCharacterFontStyle.Italic;
                  ocrCharacter.FontStyle |= OcrCharacterFontStyle.Underline;
               }

               // Check if next character is the start of a new word
               if ((ocrCharacter.Position & OcrCharacterPosition.EndOfWord) == OcrCharacterPosition.EndOfWord ||
                  (ocrCharacter.Position & OcrCharacterPosition.EndOfLine) == OcrCharacterPosition.EndOfLine)
                  nextCharacterIsNewWord = true;
               else
                  nextCharacterIsNewWord = false;

               ocrZoneCharacters[i] = ocrCharacter;
            }
         }

         // Replace the characters with the modified one before we save
         ocrPage.SetRecognizedCharacters(ocrPageCharacters);
      }

      // Create an OCR document so we can save the results
      using (IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(null, OcrCreateDocumentOptions.AutoDeleteFile))
      {
         // Add the page and dispose it
         ocrDocument.Pages.Add(ocrPage);
         ocrPage.Dispose();

         // Show the recognition results
         // Set the PDF options to save as PDF/A text only
         PdfDocumentOptions pdfOptions = ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf) as PdfDocumentOptions;
         pdfOptions.DocumentType = PdfDocumentType.PdfA;
         pdfOptions.ImageOverText = false;
         ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions);

         ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, null);

         // Open and check the result file, it should contain the following text
         // "Normal Line"
         // "Bold And Italic Line"
         // "Monospaced Line"
         // With the second line bold and underlined now
      }

      // Shutdown the engine
      // Note: calling Dispose will also automatically shutdown the engine if it has been started
      ocrEngine.Shutdown();
   }
}

static class LEAD_VARS
{
public const string ImagesDir = @"C:\Users\Public\Documents\LEADTOOLS Images";
public const string OcrAdvantageRuntimeDir = @"C:\LEADTOOLS 19\Bin\Common\OcrAdvantageRuntime";
}

using Leadtools;
using Leadtools.Codecs;
using Leadtools.Controls;
using Leadtools.Forms.Ocr;
using Leadtools.Forms;
using Leadtools.Forms.DocumentWriters;
using Leadtools.ImageProcessing;

      
public async Task RecognizedCharactersExample()
{
   string imageFileName = @"Assets\OCR1.TIF";
   string textFileName = "OCR1.txt";
   string pdfFileName = "OCR1.pdf";
   // Create an instance of the engine
   IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false);

   // Start the engine using default parameters
   ocrEngine.Startup(null, null, String.Empty, Tools.OcrEnginePath);

   // Create an OCR document
   IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument();

   // Add this image to the document
   IOcrPage ocrPage = null;
   using (RasterCodecs codecs = new RasterCodecs())
   {
      StorageFile loadFile = await Tools.AppInstallFolder.GetFileAsync(imageFileName);
      using (RasterImage image = await codecs.LoadAsync(LeadStreamFactory.Create(loadFile)))
         ocrPage = ocrDocument.Pages.AddPage(image, null);
   }

   // Recognize this page
   ocrPage.Recognize(null);

   // Dump the characters into a text file
   StorageFile file = await Tools.AppLocalFolder.CreateFileAsync(textFileName);
   using (IRandomAccessStream fileStream = await file.OpenAsync(FileAccessMode.ReadWrite))
   {
      using (IOutputStream outputStream = fileStream.GetOutputStreamAt(0))
      {
         using (DataWriter writer = new DataWriter(outputStream))
         {
            IOcrPageCharacters ocrPageCharacters = ocrPage.GetRecognizedCharacters();
            foreach (IOcrZoneCharacters ocrZoneCharacters in ocrPageCharacters)
            {
               // Show the words found in this zone.
               ICollection<OcrWord> words = ocrZoneCharacters.GetWords();
               Debug.WriteLine("Words:");
               foreach (OcrWord word in words)
                  Debug.WriteLine("Word: {0}, at {1}, characters index from {2} to {3}", word.Value, word.Bounds, word.FirstCharacterIndex, word.LastCharacterIndex);

               bool nextCharacterIsNewWord = true;

               for (int i = 0; i < ocrZoneCharacters.Count; i++)
               {
                  OcrCharacter ocrCharacter = ocrZoneCharacters[i];

                  // Capitalize the first letter if this is a new word
                  if (nextCharacterIsNewWord)
                     ocrCharacter.Code = Char.ToUpper(ocrCharacter.Code);

                  writer.WriteString(string.Format("Code: {0}, Confidence: {1}, WordIsCertain: {2}, Bounds: {3}, Position: {4}, FontSize: {5}, FontStyle: {6}",
                     ocrCharacter.Code,
                     ocrCharacter.Confidence,
                     ocrCharacter.WordIsCertain,
                     ocrCharacter.Bounds,
                     ocrCharacter.Position,
                     ocrCharacter.FontSize,
                     ocrCharacter.FontStyle));

                  // If the charcater is bold, make it underline
                  if ((ocrCharacter.FontStyle & OcrCharacterFontStyle.Bold) == OcrCharacterFontStyle.Bold)
                  {
                     ocrCharacter.FontStyle |= OcrCharacterFontStyle.Italic;
                     ocrCharacter.FontStyle |= OcrCharacterFontStyle.Underline;
                  }

                  // Check if next character is the start of a new word
                  if ((ocrCharacter.Position & OcrCharacterPosition.EndOfWord) == OcrCharacterPosition.EndOfWord ||
                     (ocrCharacter.Position & OcrCharacterPosition.EndOfLine) == OcrCharacterPosition.EndOfLine)
                     nextCharacterIsNewWord = true;
                  else
                     nextCharacterIsNewWord = false;

                  ocrZoneCharacters[i] = ocrCharacter;
               }
            }

            // Replace the characters with the modified one before we save
            ocrPage.SetRecognizedCharacters(ocrPageCharacters);

            await writer.StoreAsync();
            writer.DetachStream();
         }

         await outputStream.FlushAsync();
      }
   }

   // Show the recognition results
   // Set the PDF options to save as PDF/A text only
   PdfDocumentOptions pdfOptions = ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf) as PdfDocumentOptions;
   pdfOptions.DocumentType = PdfDocumentType.PdfA;
   pdfOptions.ImageOverText = false;
   ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions);

   StorageFile saveFile = await Tools.AppLocalFolder.CreateFileAsync(pdfFileName, CreationCollisionOption.ReplaceExisting);
   await ocrDocument.SaveAsync(LeadStreamFactory.Create(saveFile), DocumentFormat.Pdf, null);

   // Shutdown the engine
   ocrEngine.Shutdown();
}

Requirements

Target Platforms

Reference

IOcrPage Interface
IOcrPage Members
SetRecognizedCharacters Method
OcrCharacter Structure
IOcrPageCharacters Interface
IOcrZoneCharacters Interface
IOcrPageCollection Interface
IOcrZoneCollection Interface
OcrZone Structure
AutoZone
Programming with the LEADTOOLS .NET OCR
OCR Confidence Reporting