Leadtools.Forms.Ocr Namespace : IOcrDocumentManager Interface |
public interface IOcrDocumentManager
'Declaration Public Interface IOcrDocumentManager
'Usage Dim instance As IOcrDocumentManager
public interface IOcrDocumentManager
function Leadtools.Forms.Ocr.IOcrDocumentManager()
public interface class IOcrDocumentManager
You can access the instance of the IOcrDocumentManager used by an IOcrEngine through the IOcrEngine.DocumentManager property.
The IOcrDocumentManager interface allows you to create IOcrDocument objects that encapsulate an OCR'ed document. Each OCR document contains a collection of IOcrPage that you can use to add and remove pages from the document. After you add the pages to the document and optionally manage the zones on the pages, you can call the IOcrPage.Recognize method on each page to obtain the recognition data and store them internally in the pages. Once you are done, you can use the save methods of the IOcrDocument object to save the document into its final format.
LEADTOOLS supports saving to various standard document formats such as PDF, Microsoft Word, HTML and several others through the LEADTOOLS Document Writers engine. For more information, refer to IOcrDocument and Leadtools.Forms.DocumentWriters.DocumentFormat.
Typical OCR operation using the IOcrEngine involves starting up the engine, create an IOcrDocument object using the IOcrDocumentManager.CreateDocument method and adding the pages into it and perform either automatic or manual zoning. Once this is done, After the recognition data is collected using IOcrPage.Recognize, you use the various IOcrDocument.Save methods to save the document to its final format such as PDF, DOC or HTML.
In addition to the above, you can use IOcrDocument.SaveXml to save the document as XML.
Public Sub OcrDocumentManagerExample() Dim tifFileName1 As String = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif") Dim tifFileName2 As String = Path.Combine(LEAD_VARS.ImagesDir, "Ocr2.tif") Dim outputDirectory As String = LEAD_VARS.ImagesDir ' Create the output directory If (Directory.Exists(outputDirectory)) Then Directory.Delete(outputDirectory, True) End If Directory.CreateDirectory(outputDirectory) ' Create an instance of the engine Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Plus, False) ' Start the engine using default parameters Console.WriteLine("Starting up the engine...") ocrEngine.Startup(Nothing, Nothing, Nothing, Nothing) ' Create the OCR document Console.WriteLine("Creating the OCR document...") Dim ocrDocumentManager As IOcrDocumentManager = ocrEngine.DocumentManager Using ocrDocument As IOcrDocument = ocrDocumentManager.CreateDocument() ' Add the pages to the document Console.WriteLine("Adding the pages...") ocrDocument.Pages.AddPage(tifFileName1, Nothing) ocrDocument.Pages.AddPage(tifFileName2, Nothing) ' Recognize the pages to this document. Note, we did not call AutoZone, it will explicitly be called by Recognize Console.WriteLine("Recognizing all the pages...") ocrDocument.Pages.Recognize(Nothing) ' Save to all the formats supported by this OCR engine Dim formats As Array = System.Enum.GetValues(GetType(DocumentFormat)) For Each format As DocumentFormat In formats Dim friendlyName As String = DocumentWriter.GetFormatFriendlyName(format) Console.WriteLine("Saving (using default options) to {0}...", friendlyName) ' Construct the output file name (output_directory + document_format_name + . + extension) Dim extension As String = DocumentWriter.GetFormatFileExtension(format) Dim outputFileName As String = Path.Combine(outputDirectory, format.ToString() + "." + extension) ' Save the document ocrDocument.Save(outputFileName, format, Nothing) ' If this is the LTD format, convert it to PDF If format = DocumentFormat.Ltd Then Console.WriteLine("Converting the LTD file to PDF...") Dim pdfFileName As String = Path.Combine(outputDirectory, format.ToString() + "_pdf.pdf") Dim docWriter As DocumentWriter = ocrEngine.DocumentWriterInstance docWriter.Convert(outputFileName, pdfFileName, DocumentFormat.Pdf) End If Next ' Now save to all the engine native formats (if any) supported by the engine Dim engineFormats() As String = ocrDocumentManager.GetSupportedEngineFormats() For Each engineFormat As String In engineFormats Dim friendlyName As String = ocrDocumentManager.GetEngineFormatFriendlyName(engineFormat) Console.WriteLine("Saving to engine native format {0}...", friendlyName) ' Construct the output file name (output_directory + "engine" + engine_format_name + . + extension) Dim extension As String = ocrDocumentManager.GetEngineFormatFileExtension(engineFormat) Dim outputFileName As String = Path.Combine(outputDirectory, "engine_" + engineFormat + "." + extension) ' To use this format, set it in the IOcrDocumentManager.EngineFormat and do a normal save using DocumentFormat.User ' Save the document ocrDocumentManager.EngineFormat = engineFormat ocrDocument.Save(outputFileName, DocumentFormat.User, Nothing) Next End Using ' Shutdown the engine ' Note: calling Dispose will also automatically shutdown the engine if it has been started Console.WriteLine("Shutting down...") ocrEngine.Shutdown() End Using End Sub Public NotInheritable Class LEAD_VARS Public Const ImagesDir As String = "C:\Users\Public\Documents\LEADTOOLS Images" End Class
public void OcrDocumentManagerExample() { string tifFileName1 = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif"); string tifFileName2 = Path.Combine(LEAD_VARS.ImagesDir, "Ocr2.tif"); string outputDirectory = LEAD_VARS.ImagesDir; // Create the output directory if(Directory.Exists(outputDirectory)) Directory.Delete(outputDirectory, true); Directory.CreateDirectory(outputDirectory); // Create an instance of the engine using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Plus, false)) { // Start the engine using default parameters Console.WriteLine("Starting up the engine..."); ocrEngine.Startup(null, null, null, null); // Create the OCR document Console.WriteLine("Creating the OCR document..."); IOcrDocumentManager ocrDocumentManager = ocrEngine.DocumentManager; using(IOcrDocument ocrDocument = ocrDocumentManager.CreateDocument()) { // Add the pages to the document Console.WriteLine("Adding the pages..."); ocrDocument.Pages.AddPage(tifFileName1, null); ocrDocument.Pages.AddPage(tifFileName2, null); // Recognize the pages to this document. Note, we did not call AutoZone, it will explicitly be called by Recognize Console.WriteLine("Recognizing all the pages..."); ocrDocument.Pages.Recognize(null); // Save to all the formats supported by this OCR engine Array formats = Enum.GetValues(typeof(DocumentFormat)); foreach(DocumentFormat format in formats) { string friendlyName = DocumentWriter.GetFormatFriendlyName(format); Console.WriteLine("Saving (using default options) to {0}...", friendlyName); // Construct the output file name (output_directory + document_format_name + . + extension) string extension = DocumentWriter.GetFormatFileExtension(format); string outputFileName = Path.Combine(outputDirectory, format.ToString() + "." + extension); // Save the document ocrDocument.Save(outputFileName, format, null); // If this is the LTD format, convert it to PDF if(format == DocumentFormat.Ltd) { Console.WriteLine("Converting the LTD file to PDF..."); string pdfFileName = Path.Combine(outputDirectory, format.ToString() + "_pdf.pdf"); DocumentWriter docWriter = ocrEngine.DocumentWriterInstance; docWriter.Convert(outputFileName, pdfFileName, DocumentFormat.Pdf); } } // Now save to all the engine native formats (if any) supported by the engine string[] engineFormats = ocrDocumentManager.GetSupportedEngineFormats(); foreach(string engineFormat in engineFormats) { string friendlyName = ocrDocumentManager.GetEngineFormatFriendlyName(engineFormat); Console.WriteLine("Saving to engine native format {0}...", friendlyName); // Construct the output file name (output_directory + "engine" + engine_format_name + . + extension) string extension = ocrDocumentManager.GetEngineFormatFileExtension(engineFormat); string outputFileName = Path.Combine(outputDirectory, "engine_" + engineFormat + "." + extension); // To use this format, set it in the IOcrDocumentManager.EngineFormat and do a normal save using DocumentFormat.User // Save the document ocrDocumentManager.EngineFormat = engineFormat; ocrDocument.Save(outputFileName, DocumentFormat.User, null); } } // Shutdown the engine // Note: calling Dispose will also automatically shutdown the engine if it has been started Console.WriteLine("Shutting down..."); ocrEngine.Shutdown(); } } static class LEAD_VARS { public const string ImagesDir = @"C:\Users\Public\Documents\LEADTOOLS Images"; }
[TestMethod] public async Task OcrDocumentManagerExample() { string tifFileName1 = @"Assets\Ocr1.tif"; string tifFileName2 = @"Assets\Ocr2.tif"; string[] sourceFiles = { tifFileName1, tifFileName2 }; // Create an instance of the engine IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // Start the engine using default parameters Debug.WriteLine("Starting up the engine..."); ocrEngine.Startup(null, null, String.Empty, Tools.OcrEnginePath); // Create the OCR document Debug.WriteLine("Creating the OCR document..."); IOcrDocumentManager ocrDocumentManager = ocrEngine.DocumentManager; IOcrDocument ocrDocument = ocrDocumentManager.CreateDocument(); // Add the pages to the document Debug.WriteLine("Adding the pages..."); using (RasterCodecs codecs = new RasterCodecs()) { foreach (string fileName in sourceFiles) { StorageFile loadFile = await Tools.AppInstallFolder.GetFileAsync(fileName); using (RasterImage image = await codecs.LoadAsync(LeadStreamFactory.Create(loadFile))) { ocrDocument.Pages.AddPage(image, null); } } } // Recognize the pages to this document. Note, we did not call AutoZone, it will explicitly be called by Recognize Debug.WriteLine("Recognizing all the pages..."); ocrDocument.Pages.Recognize(null); // Save to all the formats supported by this OCR engine DocumentFormat[] supportedFormats = DocumentWriter.GetSupportedFormats(); foreach (DocumentFormat format in supportedFormats) { string friendlyName = DocumentWriter.GetFormatFriendlyName(format); Debug.WriteLine("Saving (using default options) to {0}...", friendlyName); // Construct the output file name (output_directory + document_format_name + . + extension) string extension = DocumentWriter.GetFormatFileExtension(format); string outputFileName = format.ToString() + "." + extension; // Save the document StorageFile saveFile = await Tools.AppLocalFolder.CreateFileAsync(outputFileName, CreationCollisionOption.ReplaceExisting); await ocrDocument.SaveAsync(LeadStreamFactory.Create(saveFile), format, null); // If this is the LTD format, convert it to PDF if (format == DocumentFormat.Ltd) { Debug.WriteLine("Converting the LTD file to PDF..."); string pdfFileName = Path.Combine(Tools.AppLocalFolder.Path, format.ToString() + "_pdf.pdf"); DocumentWriter docWriter = ocrEngine.DocumentWriterInstance; docWriter.Convert(outputFileName, pdfFileName, DocumentFormat.Pdf); } } // Shutdown the engine Debug.WriteLine("Shutting down..."); ocrEngine.Shutdown(); }
Target Platforms: Windows 7, Windows Vista SP1 or later, Windows XP SP3, Windows Server 2008 (Server Core not supported), Windows Server 2008 R2 (Server Core supported with SP1 or later), Windows Server 2003 SP2
IOcrDocumentManager Members
Leadtools.Forms.Ocr Namespace
Leadtools.Forms.DocumentWriters.DocumentFormat
IOcrDocument Interface
IOcrDocument.Save
IOcrDocument.SaveXml
IOcrPage.Recognize
IOcrEngine Interface
OcrEngineManager Class
OcrEngineType Enumeration
Programming with the LEADTOOLS .NET OCR
Files to be Included with Your Application
Recognizing OCR Pages
public interface IOcrDocumentManager
'Declaration Public Interface IOcrDocumentManager
'Usage Dim instance As IOcrDocumentManager
public interface IOcrDocumentManager
function Leadtools.Forms.Ocr.IOcrDocumentManager()
public interface class IOcrDocumentManager
You can access the instance of the IOcrDocumentManager used by an IOcrEngine through the IOcrEngine.DocumentManager property.
The IOcrDocumentManager interface allows you to create IOcrDocument objects that encapsulate an OCR'ed document. Each OCR document contains a collection of IOcrPage that you can use to add and remove pages from the document. After you add the pages to the document and optionally manage the zones on the pages, you can call the IOcrPage.Recognize method on each page to obtain the recognition data and store them internally in the pages. Once you are done, you can use the save methods of the IOcrDocument object to save the document into its final format.
LEADTOOLS supports saving to various standard document formats such as PDF, Microsoft Word, HTML and several others through the LEADTOOLS Document Writers engine. For more information, refer to IOcrDocument and Leadtools.Forms.DocumentWriters.DocumentFormat.
Typical OCR operation using the IOcrEngine involves starting up the engine, create an IOcrDocument object using the IOcrDocumentManager.CreateDocument method and adding the pages into it and perform either automatic or manual zoning. Once this is done, After the recognition data is collected using IOcrPage.Recognize, you use the various IOcrDocument.Save methods to save the document to its final format such as PDF, DOC or HTML.
In addition to the above, you can use IOcrDocument.SaveXml to save the document as XML.
Public Sub OcrDocumentManagerExample() Dim tifFileName1 As String = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif") Dim tifFileName2 As String = Path.Combine(LEAD_VARS.ImagesDir, "Ocr2.tif") Dim outputDirectory As String = LEAD_VARS.ImagesDir ' Create the output directory If (Directory.Exists(outputDirectory)) Then Directory.Delete(outputDirectory, True) End If Directory.CreateDirectory(outputDirectory) ' Create an instance of the engine Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Plus, False) ' Start the engine using default parameters Console.WriteLine("Starting up the engine...") ocrEngine.Startup(Nothing, Nothing, Nothing, Nothing) ' Create the OCR document Console.WriteLine("Creating the OCR document...") Dim ocrDocumentManager As IOcrDocumentManager = ocrEngine.DocumentManager Using ocrDocument As IOcrDocument = ocrDocumentManager.CreateDocument() ' Add the pages to the document Console.WriteLine("Adding the pages...") ocrDocument.Pages.AddPage(tifFileName1, Nothing) ocrDocument.Pages.AddPage(tifFileName2, Nothing) ' Recognize the pages to this document. Note, we did not call AutoZone, it will explicitly be called by Recognize Console.WriteLine("Recognizing all the pages...") ocrDocument.Pages.Recognize(Nothing) ' Save to all the formats supported by this OCR engine Dim formats As Array = System.Enum.GetValues(GetType(DocumentFormat)) For Each format As DocumentFormat In formats Dim friendlyName As String = DocumentWriter.GetFormatFriendlyName(format) Console.WriteLine("Saving (using default options) to {0}...", friendlyName) ' Construct the output file name (output_directory + document_format_name + . + extension) Dim extension As String = DocumentWriter.GetFormatFileExtension(format) Dim outputFileName As String = Path.Combine(outputDirectory, format.ToString() + "." + extension) ' Save the document ocrDocument.Save(outputFileName, format, Nothing) ' If this is the LTD format, convert it to PDF If format = DocumentFormat.Ltd Then Console.WriteLine("Converting the LTD file to PDF...") Dim pdfFileName As String = Path.Combine(outputDirectory, format.ToString() + "_pdf.pdf") Dim docWriter As DocumentWriter = ocrEngine.DocumentWriterInstance docWriter.Convert(outputFileName, pdfFileName, DocumentFormat.Pdf) End If Next ' Now save to all the engine native formats (if any) supported by the engine Dim engineFormats() As String = ocrDocumentManager.GetSupportedEngineFormats() For Each engineFormat As String In engineFormats Dim friendlyName As String = ocrDocumentManager.GetEngineFormatFriendlyName(engineFormat) Console.WriteLine("Saving to engine native format {0}...", friendlyName) ' Construct the output file name (output_directory + "engine" + engine_format_name + . + extension) Dim extension As String = ocrDocumentManager.GetEngineFormatFileExtension(engineFormat) Dim outputFileName As String = Path.Combine(outputDirectory, "engine_" + engineFormat + "." + extension) ' To use this format, set it in the IOcrDocumentManager.EngineFormat and do a normal save using DocumentFormat.User ' Save the document ocrDocumentManager.EngineFormat = engineFormat ocrDocument.Save(outputFileName, DocumentFormat.User, Nothing) Next End Using ' Shutdown the engine ' Note: calling Dispose will also automatically shutdown the engine if it has been started Console.WriteLine("Shutting down...") ocrEngine.Shutdown() End Using End Sub Public NotInheritable Class LEAD_VARS Public Const ImagesDir As String = "C:\Users\Public\Documents\LEADTOOLS Images" End Class
public void OcrDocumentManagerExample() { string tifFileName1 = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif"); string tifFileName2 = Path.Combine(LEAD_VARS.ImagesDir, "Ocr2.tif"); string outputDirectory = LEAD_VARS.ImagesDir; // Create the output directory if(Directory.Exists(outputDirectory)) Directory.Delete(outputDirectory, true); Directory.CreateDirectory(outputDirectory); // Create an instance of the engine using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Plus, false)) { // Start the engine using default parameters Console.WriteLine("Starting up the engine..."); ocrEngine.Startup(null, null, null, null); // Create the OCR document Console.WriteLine("Creating the OCR document..."); IOcrDocumentManager ocrDocumentManager = ocrEngine.DocumentManager; using(IOcrDocument ocrDocument = ocrDocumentManager.CreateDocument()) { // Add the pages to the document Console.WriteLine("Adding the pages..."); ocrDocument.Pages.AddPage(tifFileName1, null); ocrDocument.Pages.AddPage(tifFileName2, null); // Recognize the pages to this document. Note, we did not call AutoZone, it will explicitly be called by Recognize Console.WriteLine("Recognizing all the pages..."); ocrDocument.Pages.Recognize(null); // Save to all the formats supported by this OCR engine Array formats = Enum.GetValues(typeof(DocumentFormat)); foreach(DocumentFormat format in formats) { string friendlyName = DocumentWriter.GetFormatFriendlyName(format); Console.WriteLine("Saving (using default options) to {0}...", friendlyName); // Construct the output file name (output_directory + document_format_name + . + extension) string extension = DocumentWriter.GetFormatFileExtension(format); string outputFileName = Path.Combine(outputDirectory, format.ToString() + "." + extension); // Save the document ocrDocument.Save(outputFileName, format, null); // If this is the LTD format, convert it to PDF if(format == DocumentFormat.Ltd) { Console.WriteLine("Converting the LTD file to PDF..."); string pdfFileName = Path.Combine(outputDirectory, format.ToString() + "_pdf.pdf"); DocumentWriter docWriter = ocrEngine.DocumentWriterInstance; docWriter.Convert(outputFileName, pdfFileName, DocumentFormat.Pdf); } } // Now save to all the engine native formats (if any) supported by the engine string[] engineFormats = ocrDocumentManager.GetSupportedEngineFormats(); foreach(string engineFormat in engineFormats) { string friendlyName = ocrDocumentManager.GetEngineFormatFriendlyName(engineFormat); Console.WriteLine("Saving to engine native format {0}...", friendlyName); // Construct the output file name (output_directory + "engine" + engine_format_name + . + extension) string extension = ocrDocumentManager.GetEngineFormatFileExtension(engineFormat); string outputFileName = Path.Combine(outputDirectory, "engine_" + engineFormat + "." + extension); // To use this format, set it in the IOcrDocumentManager.EngineFormat and do a normal save using DocumentFormat.User // Save the document ocrDocumentManager.EngineFormat = engineFormat; ocrDocument.Save(outputFileName, DocumentFormat.User, null); } } // Shutdown the engine // Note: calling Dispose will also automatically shutdown the engine if it has been started Console.WriteLine("Shutting down..."); ocrEngine.Shutdown(); } } static class LEAD_VARS { public const string ImagesDir = @"C:\Users\Public\Documents\LEADTOOLS Images"; }
[TestMethod] public async Task OcrDocumentManagerExample() { string tifFileName1 = @"Assets\Ocr1.tif"; string tifFileName2 = @"Assets\Ocr2.tif"; string[] sourceFiles = { tifFileName1, tifFileName2 }; // Create an instance of the engine IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // Start the engine using default parameters Debug.WriteLine("Starting up the engine..."); ocrEngine.Startup(null, null, String.Empty, Tools.OcrEnginePath); // Create the OCR document Debug.WriteLine("Creating the OCR document..."); IOcrDocumentManager ocrDocumentManager = ocrEngine.DocumentManager; IOcrDocument ocrDocument = ocrDocumentManager.CreateDocument(); // Add the pages to the document Debug.WriteLine("Adding the pages..."); using (RasterCodecs codecs = new RasterCodecs()) { foreach (string fileName in sourceFiles) { StorageFile loadFile = await Tools.AppInstallFolder.GetFileAsync(fileName); using (RasterImage image = await codecs.LoadAsync(LeadStreamFactory.Create(loadFile))) { ocrDocument.Pages.AddPage(image, null); } } } // Recognize the pages to this document. Note, we did not call AutoZone, it will explicitly be called by Recognize Debug.WriteLine("Recognizing all the pages..."); ocrDocument.Pages.Recognize(null); // Save to all the formats supported by this OCR engine DocumentFormat[] supportedFormats = DocumentWriter.GetSupportedFormats(); foreach (DocumentFormat format in supportedFormats) { string friendlyName = DocumentWriter.GetFormatFriendlyName(format); Debug.WriteLine("Saving (using default options) to {0}...", friendlyName); // Construct the output file name (output_directory + document_format_name + . + extension) string extension = DocumentWriter.GetFormatFileExtension(format); string outputFileName = format.ToString() + "." + extension; // Save the document StorageFile saveFile = await Tools.AppLocalFolder.CreateFileAsync(outputFileName, CreationCollisionOption.ReplaceExisting); await ocrDocument.SaveAsync(LeadStreamFactory.Create(saveFile), format, null); // If this is the LTD format, convert it to PDF if (format == DocumentFormat.Ltd) { Debug.WriteLine("Converting the LTD file to PDF..."); string pdfFileName = Path.Combine(Tools.AppLocalFolder.Path, format.ToString() + "_pdf.pdf"); DocumentWriter docWriter = ocrEngine.DocumentWriterInstance; docWriter.Convert(outputFileName, pdfFileName, DocumentFormat.Pdf); } } // Shutdown the engine Debug.WriteLine("Shutting down..."); ocrEngine.Shutdown(); }
Target Platforms: Windows 7, Windows Vista SP1 or later, Windows XP SP3, Windows Server 2008 (Server Core not supported), Windows Server 2008 R2 (Server Core supported with SP1 or later), Windows Server 2003 SP2
IOcrDocumentManager Members
Leadtools.Forms.Ocr Namespace
Leadtools.Forms.DocumentWriters.DocumentFormat
IOcrDocument Interface
IOcrDocument.Save
IOcrDocument.SaveXml
IOcrPage.Recognize
IOcrEngine Interface
OcrEngineManager Class
OcrEngineType Enumeration
Programming with the LEADTOOLS .NET OCR
Files to be Included with Your Application
Recognizing OCR Pages