This topic and its replies were posted before the current version of LEADTOOLS was released and may no longer be applicable.
#1
Posted
:
Friday, May 2, 2014 5:04:50 AM(UTC)
Groups: Registered
Posts: 1
How to make image pdf searchable without changing the format?
other details:
VS2010
.net 3.5
#2
Posted
:
Friday, May 2, 2014 9:07:16 AM(UTC)
Groups: Manager, Tech Support, Administrators
Posts: 218
Was thanked: 12 time(s) in 12 post(s)
You can make a PDF searchable by using OCR - Optical Character Recognition. You can find more information at the following link:
http://www.leadtools.com/help/leadtools/v18/dh/to/leadtools.topics~leadtools.topics.ocr.html
You
can load any supported file format into the Engine to perform OCR on
it. You can then save the results to any of the supported document
formats listed here:
http://www.leadtools.com/help/leadtools/v18/dh/ft/leadtools.forms.documentwriters~leadtools.forms.documentwriters.documentformat.html
Here is a code snippet in C# on how to load an image and save it out as a PDF:
using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, true))
{
// Start the engine using default parameters
ocrEngine.Startup(null, null, null, null);
// Create an OCR document
using (IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument())
{
// Add a page to the document
IOcrPage ocrPage = ocrDocument.Pages.AddPage(imageFileName, null);
// Recognize the page
// Note, Recognize can be called without calling AutoZone or manually adding zones. The engine will
// check and automatically auto-zones the page
ocrPage.Recognize(null);
// Save the document we have as PDF
ocrDocument.Save(imageFileName, DocumentFormat.Pdf, null);
}
// Shutdown the engine
// Note: calling Dispose will also automatically shutdown the engine if it has been started
ocrEngine.Shutdown();
}
For more information I would recommend you check out our OCR tutorials located here:
http://www.leadtools.com/help/leadtools/v18/dh/to/leadtools.topics.forms.ocr~fo.topics.ocrtutorials.html
and the Ocr Namespace located here:
http://www.leadtools.com/help/leadtools/v18/dh/fo/leadtools.forms.ocr~leadtools.forms.ocr_namespace.html
Hadi Chami
Developer Support Manager
LEAD Technologies, Inc.
#3
Posted
:
Thursday, May 8, 2014 2:45:59 AM(UTC)
Groups: Registered
Posts: 1
Thanks for your reply.
Actually the code given here is for converting loaded image to PDF.
Our scenario is that we want to convert scanned pdf to searchable pdf.
#4
Posted
:
Thursday, May 8, 2014 4:50:46 AM(UTC)
Groups: Manager, Tech Support, Administrators
Posts: 218
Was thanked: 12 time(s) in 12 post(s)
You can use this same code to load any image (PDF included) and save it out as any of the supported document formats (PDF included).
In the sample I sent you above, the imageFileName is the path to any image, so you can set it to be your scanned PDF.
You can find a demo to test this out at the following toolkit location: C:\LEADTOOLS 18\Shortcuts\.NET Class Libraries\.NET Framework\02 Document\03 OCR - ICR - OMR\01 Main Demo
You can use this demo to load any image (including PDF) into the viewer, then you can recognize and save it out as a searchable PDF.
Hadi Chami
Developer Support Manager
LEAD Technologies, Inc.
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.