public IOcrPageCharacters GetRecognizedCharacters()
Function GetRecognizedCharacters() As IOcrPageCharacters
- (nullable LTOcrPageCharacters *)recognizedCharacters:(NSError **)error
public OcrPageCharacters getRecognizedCharacters()
IOcrPageCharacters^ GetRecognizedCharacters();
An instance of IOcrPageCharacters containing the last recognized characters data of this IOcrPage.
You must call this method after the IOcrPage has been recognized with the Recognize method. i.e., if the value of the IsRecognized property of this page is false, then calling this method will throw an exception.
You can use the GetRecognizedCharacters to examine the recognized character data. This data contain information about the character codes, their confidence, guess codes, location and position in the page as well as font information. For more information, refer to OcrCharacter.
The GetRecognizedCharacters method returns an instance of IOcrPageCharacters, this instance is a collection of IOcrZoneCharacters. The IOcrZoneCharacters.ZoneIndex property contains the zero-based index of the zone. You can get the zone information by using the same index as the Zones property of this IOcrPage.
If you wish to modify and the apply recognition data back to the page, Use SetRecognizedCharacters.
Use IOcrZoneCharacters.GetWords to get the recognized words of a zone.
Notes on spaces: The LEADTOOLS OCR Module - LEAD Engine will not return any space characters when using the GetRecognizedCharacters method.
The LEADTOOLS OCR Module - OmniPage Engine will not return space characters if the value of the boolean Recognition.SpaceIsValidCharacter setting value is false (the default). If you absolutely require space characters in the recognition results when using the LEADTOOLS OmniPage Engine, then set the value of the boolean Recognition.SpaceIsValidCharacter setting to true ( ocrEngineInstance.SettingManager.SetBooleanValue("Recognition.SpaceIsValidCharacter", true)). For more information on OCR settings, refer to IOcrSettingManager and LEADTOOLS OCR Module - OmniPage Engine Settings.
The SetRecognizedCharacters method will accept space characters in the LEADTOOLS LEAD engine. However, these space characters will be used when generating the final document (PDF) and might affect the final output. Therefore, it is not recommended that you insert space characters when using the LEADTOOLS LEAD engine.
The LEADTOOLS OCR Module - OmniPage Engine will strip any space characters from the results passed to SetRecognizedCharacters if the value of the boolean Recognition.SpaceIsValidCharacter setting value is false (the default). If you absolutely require space characters in the recognition results when using the LEADTOOLS OmniPage Engine, then set the value of the boolean Recognition.SpaceIsValidCharacter setting to true before calling SetRecognizedCharacters.
If you use the GetRecognizedCharacters and SetRecognizedCharacters methods to modify the recognition result prior to saving to an output file, and you are planning on using the engine native save capability (through setting the IOcrDocumentManager.EngineFormat property and using DocumentFormat.User in the IOcrDocument.Save method), then you must change the boolean Recognition.SpaceIsValidCharacter setting to true.
The IOcrPageCharacters interface also contains the IOcrPageCharacters.UpdateWord method that allow to modify the OCR recognition results by updating or deleting the words before optionally saving the results to the final output document.
This example will get the recognized characters of a page, modify them and set them back before saving the final document.
using Leadtools;
using Leadtools.Codecs;
using Leadtools.Ocr;
using Leadtools.Forms.Common;
using Leadtools.Document.Writer;
using Leadtools.WinForms;
using Leadtools.Drawing;
using Leadtools.ImageProcessing;
using Leadtools.ImageProcessing.Color;
public void RecognizedCharactersExample()
{
// Create an image with some text in it
RasterImage image = new RasterImage(RasterMemoryFlags.Conventional, 640, 200, 24, RasterByteOrder.Bgr, RasterViewPerspective.TopLeft, null, IntPtr.Zero, 0);
Rectangle imageRect = new Rectangle(0, 0, image.ImageWidth, image.ImageHeight);
IntPtr hdc = RasterImagePainter.CreateLeadDC(image);
using (Graphics g = Graphics.FromHdc(hdc))
{
g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQuality;
g.FillRectangle(Brushes.White, imageRect);
using (Font f = new Font("Arial", 20, FontStyle.Regular))
g.DrawString("Normal line", f, Brushes.Black, 0, 0);
using (Font f = new Font("Arial", 20, FontStyle.Bold))
g.DrawString("Bold, italic and underline", f, Brushes.Black, 0, 40);
using (Font f = new Font("Courier New", 20, FontStyle.Regular))
g.DrawString("Monospaced line", f, Brushes.Black, 0, 80);
}
RasterImagePainter.DeleteLeadDC(hdc);
string textFileName = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.txt");
string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.pdf");
// Create an instance of the engine
using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD))
{
// Start the engine using default parameters
ocrEngine.Startup(null, null, null, LEAD_VARS.OcrLEADRuntimeDir);
// Create an OCR page
IOcrPage ocrPage = ocrEngine.CreatePage(image, OcrImageSharingMode.AutoDispose);
// Recognize this page
ocrPage.Recognize(null);
// Dump the characters into a text file
using (StreamWriter writer = File.CreateText(textFileName))
{
IOcrPageCharacters ocrPageCharacters = ocrPage.GetRecognizedCharacters();
foreach (IOcrZoneCharacters ocrZoneCharacters in ocrPageCharacters)
{
// Show the words found in this zone. Get the word boundaries in inches
ICollection<OcrWord> words = ocrZoneCharacters.GetWords();
Console.WriteLine("Words:");
foreach (OcrWord word in words)
Console.WriteLine("Word: {0}, at {1}, characters index from {2} to {3}", word.Value, word.Bounds, word.FirstCharacterIndex, word.LastCharacterIndex);
bool nextCharacterIsNewWord = true;
for (int i = 0; i < ocrZoneCharacters.Count; i++)
{
OcrCharacter ocrCharacter = ocrZoneCharacters[i];
// Capitalize the first letter if this is a new word
if (nextCharacterIsNewWord)
ocrCharacter.Code = Char.ToUpper(ocrCharacter.Code);
writer.WriteLine("Code: {0}, Confidence: {1}, WordIsCertain: {2}, Bounds: {3}, Position: {4}, FontSize: {5}, FontStyle: {6}",
ocrCharacter.Code,
ocrCharacter.Confidence,
ocrCharacter.WordIsCertain,
ocrCharacter.Bounds,
ocrCharacter.Position,
ocrCharacter.FontSize,
ocrCharacter.FontStyle);
// If the character is bold, make it underline
if ((ocrCharacter.FontStyle & OcrCharacterFontStyle.Bold) == OcrCharacterFontStyle.Bold)
{
ocrCharacter.FontStyle |= OcrCharacterFontStyle.Italic;
ocrCharacter.FontStyle |= OcrCharacterFontStyle.Underline;
}
// Check if next character is the start of a new word
if ((ocrCharacter.Position & OcrCharacterPosition.EndOfWord) == OcrCharacterPosition.EndOfWord ||
(ocrCharacter.Position & OcrCharacterPosition.EndOfLine) == OcrCharacterPosition.EndOfLine)
nextCharacterIsNewWord = true;
else
nextCharacterIsNewWord = false;
ocrZoneCharacters[i] = ocrCharacter;
}
}
// Replace the characters with the modified one before we save
ocrPage.SetRecognizedCharacters(ocrPageCharacters);
}
// Create an OCR document so we can save the results
using (IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(null, OcrCreateDocumentOptions.AutoDeleteFile))
{
// Add the page and dispose it
ocrDocument.Pages.Add(ocrPage);
ocrPage.Dispose();
// Show the recognition results
// Set the PDF options to save as PDF/A text only
PdfDocumentOptions pdfOptions = ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf) as PdfDocumentOptions;
pdfOptions.DocumentType = PdfDocumentType.PdfA;
pdfOptions.ImageOverText = false;
ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions);
ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, null);
// Open and check the result file, it should contain the following text
// "Normal Line"
// "Bold And Italic Line"
// "Monospaced Line"
// With the second line bold and underlined now
}
// Shutdown the engine
// Note: calling Dispose will also automatically shutdown the engine if it has been started
ocrEngine.Shutdown();
}
}
static class LEAD_VARS
{
public const string ImagesDir = @"C:\LEADTOOLS21\Resources\Images";
public const string OcrLEADRuntimeDir = @"C:\LEADTOOLS21\Bin\Common\OcrLEADRuntime";
}
Imports Leadtools
Imports Leadtools.Codecs
Imports Leadtools.Ocr
Imports Leadtools.Forms
Imports Leadtools.Document.Writer
Imports Leadtools.WinForms
Imports Leadtools.Drawing
Imports Leadtools.ImageProcessing
Imports Leadtools.ImageProcessing.Color
Public Sub RecognizedCharactersExample()
' Create an image with some text in it
Dim image As New RasterImage(RasterMemoryFlags.Conventional,
640, 200, 24,
RasterByteOrder.Bgr,
RasterViewPerspective.TopLeft,
Nothing, IntPtr.Zero, 0)
Dim imageRect As New Rectangle(0, 0, image.ImageWidth, image.ImageHeight)
Dim hdc As IntPtr = RasterImagePainter.CreateLeadDC(image)
Using g As Graphics = Graphics.FromHdc(hdc)
g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQuality
g.FillRectangle(Brushes.White, imageRect)
Using f As New Font("Arial", 20, FontStyle.Regular)
g.DrawString("Normal line", f, Brushes.Black, 0, 0)
End Using
Using f As New Font("Arial", 20, FontStyle.Bold)
g.DrawString("Bold, italic and underline", f, Brushes.Black, 0, 40)
End Using
Using f As New Font("Courier New", 20, FontStyle.Regular)
g.DrawString("Monospaced line", f, Brushes.Black, 0, 80)
End Using
End Using
RasterImagePainter.DeleteLeadDC(hdc)
Dim textFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.txt")
Dim pdfFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.pdf")
' Create an instance of the engine
Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD)
' Start the engine using default parameters
ocrEngine.Startup(Nothing, Nothing, Nothing, LEAD_VARS.OcrLEADRuntimeDir)
' Create an OCR page
Dim ocrPage As IOcrPage = ocrEngine.CreatePage(image, OcrImageSharingMode.AutoDispose)
' Recognize this page
ocrPage.Recognize(Nothing)
' Dump the characters into a text file
Using writer As StreamWriter = File.CreateText(textFileName)
Dim ocrPageCharacters As IOcrPageCharacters = ocrPage.GetRecognizedCharacters()
For Each ocrZoneCharacters As IOcrZoneCharacters In ocrPageCharacters
Dim words As ICollection(Of OcrWord) = ocrZoneCharacters.GetWords()
Console.WriteLine("Words:")
For Each word As OcrWord In words
Console.WriteLine("Word: {0}, at {1}, characters index from {2} to {3}",
word.Value, word.Bounds, word.FirstCharacterIndex, word.LastCharacterIndex)
Next
Dim nextCharacterIsNewWord As Boolean = True
For i As Integer = 0 To ocrZoneCharacters.Count - 1
Dim ocrCharacter As OcrCharacter = ocrZoneCharacters(i)
' Capitalize the first letter if this is a new word
If nextCharacterIsNewWord Then
ocrCharacter.Code = [Char].ToUpper(ocrCharacter.Code)
End If
writer.WriteLine("Code: {0}, Confidence: {1}, WordIsCertain: {2}, Bounds: {3}, Position: {4}, FontSize: {5}, FontStyle: {6}",
ocrCharacter.Code,
ocrCharacter.Confidence,
ocrCharacter.WordIsCertain,
ocrCharacter.Bounds,
ocrCharacter.Position,
ocrCharacter.FontSize,
ocrCharacter.FontStyle)
' If the charcater is bold, make it underline
If (ocrCharacter.FontStyle And OcrCharacterFontStyle.Bold) = OcrCharacterFontStyle.Bold Then
ocrCharacter.FontStyle = ocrCharacter.FontStyle Or OcrCharacterFontStyle.Italic
ocrCharacter.FontStyle = ocrCharacter.FontStyle Or OcrCharacterFontStyle.Underline
End If
' Check if next character is the start of a new word
If (ocrCharacter.Position And OcrCharacterPosition.EndOfWord) = OcrCharacterPosition.EndOfWord OrElse
(ocrCharacter.Position And OcrCharacterPosition.EndOfLine) = OcrCharacterPosition.EndOfLine Then
nextCharacterIsNewWord = True
Else
nextCharacterIsNewWord = False
End If
ocrZoneCharacters(i) = ocrCharacter
Next
Next
' Replace the characters with the modified one before we save
ocrPage.SetRecognizedCharacters(ocrPageCharacters)
End Using
' Create an OCR document so we can save the results
Using ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument(Nothing, OcrCreateDocumentOptions.AutoDeleteFile)
' Add the page and dispose it
ocrDocument.Pages.Add(ocrPage)
ocrPage.Dispose()
' Show the recognition results
' Set the PDF options to save as PDF/A text only
Dim pdfOptions As PdfDocumentOptions = TryCast(ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf), PdfDocumentOptions)
pdfOptions.DocumentType = PdfDocumentType.PdfA
pdfOptions.ImageOverText = False
ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions)
' Open and check the result file, it should contain the following text
' "Normal Line"
' "Bold And Italic Line"
' "Monospaced Line"
' With the second line bold and underlined now
ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, Nothing)
End Using
' Shutdown the engine
' Note: calling Dispose will also automatically shutdown the engine if it has been started
ocrEngine.Shutdown()
End Using
End Sub
Public NotInheritable Class LEAD_VARS
Public Const ImagesDir As String = "C:\LEADTOOLS21\Resources\Images"
Public Const OcrLEADRuntimeDir As String = "C:\LEADTOOLS21\Bin\Common\OcrLEADRuntime"
End Class
Help Collections
Raster .NET | C API | C++ Class Library | HTML5 JavaScript
Document .NET | C API | C++ Class Library | HTML5 JavaScript
Medical .NET | C API | C++ Class Library | HTML5 JavaScript
Medical Web Viewer .NET
Multimedia
Direct Show .NET | C API | Filters
Media Foundation .NET | C API | Transforms
Supported Platforms
.NET, Java, Android, and iOS/macOS Assemblies
Imaging, Medical, and Document
C API/C++ Class Libraries
Imaging, Medical, and Document
HTML5 JavaScript Libraries
Imaging, Medical, and Document