Convert Images to Searchable PDF with OCR in C#

Posted on 2020-05-19 Nick Villalobos

PDFs are used virtually everywhere and by everyone these days. Throughout most organizations, PDF documents are vital to business applications and workflows. Many industries such as insurance agencies, financial institutions, and legal practices have standardized their document management systems on the PDF format due to the file format’s portability and versatility.

How these PDFs are being consumed depends on the type of PDF being dealt with. There are two main types of PDFs: image and searchable. For example, if you use a word processor to save the PDF, then that most likely will be a searchable PDF and you may copy/paste the text within the document as you please. On the other hand, if you use a scanner to convert paper to PDF, that most likely will be an image PDF and you will not be able to be able to search the text.

Even if you use a scanner to create an image PDF or were sent an image PDF by someone else, there still is a way to make it searchable. This happens through OCR and OCR is what LEADTOOLS does best! Developers are able to easily make automated OCR solutions and achieve these image to searchable PDF conversions with as little as five lines of code thanks to LEAD's powerful OCR libraries. These solutions are what save people and companies their two most valuable resources: time and money.

The below code shows you all that is needed to create a solution that converts images to seacrchable PDFs. If you want a complete step-by-step tutorial, check out our Convert Images to Searchable PDF with OCR tutorial.


static void OCR(string inputFile, string outputFile) 
{ 
    using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD, false)) 
    { 
        //Startup the LEADTOOLS OCR Engine 
        ocrEngine.Startup(null, null, null, null); 
        //Run the AutoRecognizeManager and specify PDF format 
        ocrEngine.AutoRecognizeManager.Run(inputFile, outputFile, DocumentFormat.Pdf, null, null); 
        Console.WriteLine($"OCR output saved to {outputFile}"); 
    } 
}

Try it out!

To test this for yourself, make sure to get the latest LEADTOOLS SDK evaluation for free from our site, if you have not already. This trial is good for 60 days and comes with unlimited chat and email support.

Support

Need help getting this sample up and going? Contact our support team for free technical support! For pricing or licensing questions, you can contact our sales team (sales@leadtools.com) or call us at 704-332-5532.


Stay tuned because, as promised in our previous post, "Detect and Extract MICR", we'll be featuring a lot more tutorials that programmers can use to develop applications that directly impact data capture, recognition, exchange, and other pressing business needs.

LEADTOOLS Blog

LEADTOOLS Powered by Apryse,the Market Leading PDF SDK,All Rights Reserved