The AppendLtd Method is available as an add-on to the LEADTOOLS Document and Medical Imaging toolkits.
- sourceFileName
- The source LTD file.
- destFileName
- The destination LTD file.
Visual Basic (Declaration) | |
---|---|
Public Sub AppendLtd( _ ByVal sourceFileName As String, _ ByVal destFileName As String _ ) |
Visual Basic (Usage) | Copy Code |
---|---|
Dim instance As DocumentWriter Dim sourceFileName As String Dim destFileName As String instance.AppendLtd(sourceFileName, destFileName) |
C# | |
---|---|
public void AppendLtd( string sourceFileName, string destFileName ) |
C++/CLI | |
---|---|
public: void AppendLtd( String^ sourceFileName, String^ destFileName ) |
Parameters
- sourceFileName
- The source LTD file.
- destFileName
- The destination LTD file.
This example will show how to use multiple threads to speed up OCR recognition of a multi-page image file. This example shows part of internal functionality already achieved with the Leadtools.Forms.Ocr.IOcrAutoRecognizeManager class.
Visual Basic | Copy Code |
---|---|
Public Shared Sub AppendLtdExample() ' Unlock the support needed for LEADTOOLS Document Writers (with PDF output) RasterSupport.Unlock(RasterSupportType.Document, "Replace with your own key here") RasterSupport.Unlock(RasterSupportType.OcrProfessional, "Replace with your own key here") RasterSupport.Unlock(RasterSupportType.OcrProfessionalPdfLeadOutput, "Replace with your own key here") ' Get a multi-page source file Dim sourceFileName As String = GetImageFileName() Dim destFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "AppendLtdExample.pdf") File.Delete(destFileName) ' Use OCR Professional engine Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Professional, False) ocrEngine.Startup(Nothing, Nothing, Nothing, Nothing) ' Get the number of pages in the source file Dim pageCount As Integer Using imageInfo As CodecsImageInfo = ocrEngine.RasterCodecsInstance.GetInformation(sourceFileName, True) pageCount = imageInfo.TotalPages End Using Console.WriteLine("OCRing {0} pages", pageCount) ' Use all possible cores Dim maxThreads As Integer = Environment.ProcessorCount Console.WriteLine("Machine has {0} cores", maxThreads) Dim threadCount As Integer = Math.Min(maxThreads, pageCount) Console.WriteLine("Using {0} threads", threadCount) ' Initialie the thread parameters Dim threadParams(threadCount - 1) As MyThreadParam For i As Integer = 0 To threadCount - 1 threadParams(i) = New MyThreadParam() threadParams(i).ThreadFinishedEvent = New AutoResetEvent(False) threadParams(i).OcrEngine = ocrEngine ' We can re-use the documents, so create them once here threadParams(i).OcrDocument = ocrEngine.DocumentManager.CreateDocument() threadParams(i).SourceFileName = sourceFileName ' We can re-use the LTD files, so get them here once threadParams(i).LtdFileName = Path.GetTempFileName() Next ' This is the LTD file we will use to append all recognition data ' from all threads Dim mainLtdFileName As String = Path.GetTempFileName() File.Delete(mainLtdFileName) Dim pageNumber As Integer = 1 While pageNumber <= pageCount Dim queuedUpEvents As New List(Of AutoResetEvent) ' Queue up the threads, each thread will have its own ' IOcrDocument to recognize one page to a separate LTD file Dim ltdCounter As Integer = 0 For threadIndex As Integer = 0 To threadCount - 1 If pageNumber <= pageCount Then ' Queue up this page Console.WriteLine("Queing up page {0}", pageNumber) threadParams(threadIndex).PageNumber = pageNumber queuedUpEvents.Add(threadParams(threadIndex).ThreadFinishedEvent) ThreadPool.QueueUserWorkItem(AddressOf MyThreadProc, threadParams(threadIndex)) pageNumber = pageNumber + 1 ltdCounter = ltdCounter + 1 End If Next ' Wait for the queued up threads to finish Console.WriteLine("Waiting on queued up pages to finished") WaitHandle.WaitAll(queuedUpEvents.ToArray()) Console.WriteLine("Appending LTDs") For i As Integer = 0 To ltdCounter - 1 ' Notice, first time, the main LTD does not exist, AppendLtd will ' just copy the data over from the source file ocrEngine.DocumentWriterInstance.AppendLtd(threadParams(i).LtdFileName, mainLtdFileName) Next End While ' We are done, convert the LTD to final format, here, we will ' use PDF Console.WriteLine("Converting to final format") ocrEngine.DocumentWriterInstance.Convert(mainLtdFileName, destFileName, DocumentFormat.Pdf) ' Clean-up For i As Integer = 0 To threadCount - 1 threadParams(i).ThreadFinishedEvent.Close() threadParams(i).OcrDocument.Dispose() Next Console.WriteLine("Success, file {0} is created", destFileName) End Using End Sub Class MyThreadParam ' Event to trigger when recognition is done Public ThreadFinishedEvent As AutoResetEvent ' OCR engine to use Public OcrEngine As IOcrEngine ' OCR Document to use Public OcrDocument As IOcrDocument ' Source image file Public SourceFileName As String ' Page number to recognize by this thread Public PageNumber As Integer ' LTD file to save temporary recognition data Public LtdFileName As String End Class Private Shared Sub MyThreadProc(ByVal state As Object) Dim threadParams As MyThreadParam = CType(state, MyThreadParam) Console.WriteLine(" Thread {0} is working on pages {1}", Thread.CurrentThread.GetHashCode(), threadParams.PageNumber) Try ' Delete the LTD file if it exists so we can put fresh data in it File.Delete(threadParams.LtdFileName) ' Clear any previous pages in this document threadParams.OcrDocument.Pages.Clear() ' Load the page Console.WriteLine(" Thread {0} is loading the page", Thread.CurrentThread.GetHashCode()) threadParams.OcrDocument.Pages.AddPages(threadParams.SourceFileName, threadParams.PageNumber, threadParams.PageNumber, Nothing) ' Get it, it is the last page we added Dim ocrPage As IOcrPage = threadParams.OcrDocument.Pages(threadParams.OcrDocument.Pages.Count - 1) ' Auto-zone it Console.WriteLine(" Thread {0} is auto-zoning the page", Thread.CurrentThread.GetHashCode()) ocrPage.AutoZone(Nothing) ' Recognize it Console.WriteLine(" Thread {0} is recognizing the page", Thread.CurrentThread.GetHashCode()) ocrPage.Recognize(Nothing) ' Save it Console.WriteLine(" Thread {0} is saving the page", Thread.CurrentThread.GetHashCode()) threadParams.OcrDocument.Save(threadParams.LtdFileName, DocumentFormat.Ltd, Nothing) Finally Console.WriteLine(" Thread {0} is done", Thread.CurrentThread.GetHashCode()) threadParams.ThreadFinishedEvent.Set() End Try End Sub Private Shared Function GetImageFileName() As String Dim pageTileTemplate As String = Path.Combine(LEAD_VARS.ImagesDir, "Ocr{0}.tif") Dim multiPageImageFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "AppendLtdExample.tif") File.Delete(multiPageImageFileName) ' Create a multi-page TIF file by stitching OCR1 to OCR4.tif shipped with LEADTOOLS Using codecs As New RasterCodecs() Dim finalImage As RasterImage = Nothing For page As Integer = 1 To 4 Dim pageImage As RasterImage = codecs.Load(String.Format(pageTileTemplate, page)) If IsNothing(finalImage) Then finalImage = pageImage Else finalImage.AddPage(pageImage) pageImage.Dispose() End If Next ' Save the final image codecs.Save(finalImage, multiPageImageFileName, RasterImageFormat.CcittGroup4, 1) End Using Return multiPageImageFileName End Function Public NotInheritable Class LEAD_VARS Public Const ImagesDir As String = "C:\Users\Public\Documents\LEADTOOLS Images" End Class |
C# | Copy Code |
---|---|
public static void AppendLtdExample() { // Unlock the support needed for LEADTOOLS Document Writers (with PDF output) RasterSupport.Unlock(RasterSupportType.Document, "Replace with your own key here"); RasterSupport.Unlock(RasterSupportType.OcrProfessional, "Replace with your own key here"); RasterSupport.Unlock(RasterSupportType.OcrProfessionalPdfLeadOutput, "Replace with your own key here"); // Get a multi-page source file string sourceFileName = GetImageFileName(); string destFileName = Path.Combine(LEAD_VARS.ImagesDir,"AppendLtdExample.pdf"); File.Delete(destFileName); // Use OCR Professional engine using(IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Professional, false)) { ocrEngine.Startup(null, null, null, null); // Get the number of pages in the source file int pageCount; using(CodecsImageInfo imageInfo = ocrEngine.RasterCodecsInstance.GetInformation(sourceFileName, true)) { pageCount = imageInfo.TotalPages; } Console.WriteLine("OCRing {0} pages", pageCount); // Use all possible cores int maxThreads = Environment.ProcessorCount; Console.WriteLine("Machine has {0} cores", maxThreads); int threadCount = Math.Min(maxThreads, pageCount); Console.WriteLine("Using {0} threads", threadCount); // Initialie the thread parameters MyThreadParam[] threadParams = new MyThreadParam[threadCount]; for(int i = 0; i < threadCount; i++) { threadParams[i] = new MyThreadParam(); threadParams[i].ThreadFinishedEvent = new AutoResetEvent(false); threadParams[i].OcrEngine = ocrEngine; // We can re-use the documents, so create them once here threadParams[i].OcrDocument = ocrEngine.DocumentManager.CreateDocument(); threadParams[i].SourceFileName = sourceFileName; // We can re-use the LTD files, so get them here once threadParams[i].LtdFileName = Path.GetTempFileName(); } // This is the LTD file we will use to append all recognition data // from all threads string mainLtdFileName = Path.GetTempFileName(); File.Delete(mainLtdFileName); int pageNumber = 1; while(pageNumber <= pageCount) { List<AutoResetEvent> queuedUpEvents = new List<AutoResetEvent>(); // Queue up the threads, each thread will have its own // IOcrDocument to recognize one page to a separate LTD file int ltdCounter = 0; for(int threadIndex = 0; threadIndex < threadCount; threadIndex++) { if(pageNumber <= pageCount) { // Queue up this page Console.WriteLine("Queing up page {0}", pageNumber); threadParams[threadIndex].PageNumber = pageNumber; queuedUpEvents.Add(threadParams[threadIndex].ThreadFinishedEvent); ThreadPool.QueueUserWorkItem(new WaitCallback(MyThreadProc), threadParams[threadIndex]); pageNumber++; ltdCounter++; } } // Wait for the queued up threads to finish Console.WriteLine("Waiting on queued up pages to finished"); WaitHandle.WaitAll(queuedUpEvents.ToArray()); Console.WriteLine("Appending LTDs"); for(int i = 0; i < ltdCounter; i++) { // Notice, first time, the main LTD does not exist, AppendLtd will // just copy the data over from the source file ocrEngine.DocumentWriterInstance.AppendLtd(threadParams[i].LtdFileName, mainLtdFileName); } } // We are done, convert the LTD to final format, here, we will // use PDF Console.WriteLine("Converting to final format"); ocrEngine.DocumentWriterInstance.Convert(mainLtdFileName, destFileName, DocumentFormat.Pdf); // Clean-up for(int i = 0; i < threadCount; i++) { threadParams[i].ThreadFinishedEvent.Close(); threadParams[i].OcrDocument.Dispose(); } Console.WriteLine("Success, file {0} is created", destFileName); } } class MyThreadParam { // Event to trigger when recognition is done public AutoResetEvent ThreadFinishedEvent; // OCR engine to use public IOcrEngine OcrEngine; // OCR Document to use public IOcrDocument OcrDocument; // Source image file public string SourceFileName; // Page number to recognize by this thread public int PageNumber; // LTD file to save temporary recognition data public string LtdFileName; } private static void MyThreadProc(object state) { MyThreadParam threadParams = state as MyThreadParam; Console.WriteLine(" Thread {0} is working on pages {1}", Thread.CurrentThread.GetHashCode(), threadParams.PageNumber); try { // Delete the LTD file if it exists so we can put fresh data in it File.Delete(threadParams.LtdFileName); // Clear any previous pages in this document threadParams.OcrDocument.Pages.Clear(); // Load the page Console.WriteLine(" Thread {0} is loading the page", Thread.CurrentThread.GetHashCode()); threadParams.OcrDocument.Pages.AddPages(threadParams.SourceFileName, threadParams.PageNumber, threadParams.PageNumber, null); // Get it, it is the last page we added IOcrPage ocrPage = threadParams.OcrDocument.Pages[threadParams.OcrDocument.Pages.Count - 1]; // Auto-zone it Console.WriteLine(" Thread {0} is auto-zoning the page", Thread.CurrentThread.GetHashCode()); ocrPage.AutoZone(null); // Recognize it Console.WriteLine(" Thread {0} is recognizing the page", Thread.CurrentThread.GetHashCode()); ocrPage.Recognize(null); // Save it Console.WriteLine(" Thread {0} is saving the page", Thread.CurrentThread.GetHashCode()); threadParams.OcrDocument.Save(threadParams.LtdFileName, DocumentFormat.Ltd, null); } finally { Console.WriteLine(" Thread {0} is done", Thread.CurrentThread.GetHashCode()); threadParams.ThreadFinishedEvent.Set(); } } private static string GetImageFileName() { string pageTileTemplate = Path.Combine(LEAD_VARS.ImagesDir,"Ocr{0}.tif"); string multiPageImageFileName = Path.Combine(LEAD_VARS.ImagesDir,"AppendLtdExample.tif"); File.Delete(multiPageImageFileName); // Create a multi-page TIF file by stitching OCR1 to OCR4.tif shipped with LEADTOOLS using(RasterCodecs codecs = new RasterCodecs()) { RasterImage finalImage = null; for(int page = 1; page <= 4; page++) { RasterImage pageImage = codecs.Load(string.Format(pageTileTemplate, page)); if(finalImage == null) { finalImage = pageImage; } else { finalImage.AddPage(pageImage); pageImage.Dispose(); } } // Save the final image codecs.Save(finalImage, multiPageImageFileName, RasterImageFormat.CcittGroup4, 1); } return multiPageImageFileName; } static class LEAD_VARS { public const string ImagesDir = @"C:\Users\Public\Documents\LEADTOOLS Images"; } |
You can use this method to append several LTD files together before calling Convert. The example below shows how this can be useful in a multi-threading OCR solution.
sourceFileName must exist on disk before calling this method, otherwise, an exception will be thrown.
destFileName may or may not exit prior to calling this method, if it does not exist, a new file is created and the data is copied as is from the source file.
When this method returns, destFileName will contain all the pages from the source and destination file (if exists), for example if the source file has 10 pages and the destination file has 20 pages, the result will be an LTD with 30 pages in destFileName with the 10 pages from the source file appended to the end of the destination file at page index 20, 21, 22, etc.
The source file name is never changed by this method.
Target Platforms: Microsoft .NET Framework 2.0, Windows 2000, Windows XP, Windows Server 2003 family, Windows Server 2008 family, Windows Vista, Windows 7