Programming with LEADTOOLS PDF

Adobe Portable Format (PDF) was developed by the Adobe Corporation to allow the exchange and viewing of electronic documents easily and reliably, independent of the environment in which they were created. This format lets you compress large documents to a size small enough to download very quickly. It is also becoming a powerful format for reproducing documents over the web.

LEADTOOLS offers extensive support for reading and writing PDF documents. The following section will quickly summarize the LEADTOOLS support for the various PDF functionalities starting with the Leadtools.Pdf assembly specific features.

PDF File Features such as Merging and Extraction of Pages
PDF Document Object Parsing
PDF as a Raster Image
Creating PDF Documents from Windows Metafiles
Creating PDF Documents from OCR Results
Creating Highly Compressed PDF Documents using MRC

PDF File Features such as Merging and Extraction of Pages

The Leadtools.Pdf.PDFFile class allows you to perform the following actions on PDF and PS files:

Get the PDF or PS version of a file
Check if a PDF file is encrypted and re-encrypt any PDF file
Get the number and size of pages in a PDF file
Quickly convert any PDF file to PDF/A
Linearize (optimize for Web viewing) any PDF file
Convert any PDF file from any version to another
Convert any Postscript file to PDF (Distilling)
Pages support: Merge multiple existing PDF files into a single PDF file. Extract, delete, insert or replace, pages from existing PDF files
Update the Table of Contents (TOC) of existing PDF files

The C# and VB PDF Features Demo shipped with LEADTOOLS contains a wizard style user interface to perform all the action above on existing PDF and PS files.

The following example will convert an existing PDF file to PDF/A:

             // Create a PDFFile object from the input PDF file
             PDFFile inputFile = new PDFFile("Input.pdf");
             // Convert it to PDF/A
             inputFile.ConvertToPDFA("OutputPDFA.pdf");

This example will merge 4 PDF files:

             // Create a PDFFile object from the first PDF file
             PDFFile firstFile = new PDFFile("1.pdf");
             // Merge it with the second, third and forth files
             firstFile.MergeWith(new string[] { "2.pdf", "3.pdf", "4.pdf" }, "Output.pdf");

PDF Document Object Parsing

The Leadtools.Pdf.PDFDocument class encapsulates a PDF document on disk and supports the following functionality:

Load the PDF document at any resolution (dots per inch)
Get information on any page in the PDF document such as its size in PDF page units, inches or pixels
Gets information about the PDF document such as its properties or metadata such as author, subject and keywords, whether it is encrypted and requires a password to read with the and the PDF file type (or version)
Parse the document structure or Table of Content (TOC) of a PDF document by reading the PDF bookmarks and the internal links (or jumps) between the pages
Read the objects found in the document such as text items (characters), images, rectangles, hyperlinks and fonts
Get a raster image render of any page or thumbnail of a page from the PDF document at any resolution

The following example will convert a multi-page PDF document to a multi-page TIFF file:

             // Load the input PDF document
             PDFDocument document = new PDFDocument("Input.pdf");
             using(RasterCodecs codecs = new RasterCodecs())
             {
                // Loop through all the pages in the document
                for(int pageNumber = 1; pageNumber <= document.Pages.Count; pageNumber++)
                {
                   // Render the page into a raster image
                   using(RasterImage image = document.GetPageImage(codecs, pageNumber))
                   {
                      // Append to (or create if it does not exist) a TIFF file
                      codecs.Save(image, "Output.tif", RasterImageFormat.TifJpeg, 24, 1, 1, -1, CodecsSavePageMode.Append);
                   }
                }
             }

The following example will parse the text of PDF File and save it to a TEXT file on disk:

             // Load the input PDF document
             PDFDocument document = new PDFDocument("Input.pdf");
             // Create the output text file
             StreamWriter writer = File.CreateText("Page1.txt");
             // Parse the text objects in all pages
             document.ParsePages(PDFParsePagesOptions.Objects, 1, -1);
             // Loop through all the pages
             foreach(PDFDocumentPage page in document.Pages)
             {
                // Loop through the objects of this page
                foreach(PDFObject obj in page.Objects)
                {
                   // Is this is a text object (character)?
                   if(obj.ObjectType == PDFObjectType.Text)
                   {
                      // Yes, write it the output file
                     writer.Write(obj.Code);
             
                     // Check if we need to move to a new line
                      if(obj.TextProperties.IsEndOfLine)
                      {
                         writer.WriteLine();
                      }
                   }
                }
                // End of page
                writer.WriteLine();
             }
             writer.Close();

PDF as a Raster Image

LEADTOOLS supports getting information, loading (rendering) and saving a PDF document as a raster image (Leadtools.RasterImage). Using the Leadtools.Codecs.RasterCodecs class, you can treat a PDF file just like any other image format such as TIFF or JPEG. You can query the size of a PDF page, its bits/pixel value, render a PDF page on the surface of an image, save any image to PDF on disk, convert PDF to TIFF or JPEG or any other supported format and back.

Refer to the following for more information:

Introduction to Leadtools
Introduction to Leadtools.Codecs
File Format PDF
Leadtools.RasterImage
Leadtools.Codecs.RasterCodecs
Leadtools.Codecs.CodecsRasterizeDocumentOptions
Leadtools.Codecs.CodecsPdfOptions

Creating PDF Documents from Windows Metafiles

The LEADTOOLS Document Writers can be used to create a searchable multi-page PDF document from one or more Windows Metafiles (EMF). Refer to the following for more information:

Introduction to Leadtools.DocumentWriters.
Leadtools.Forms.DocumentWriters.DocumentWriter
Leadtools.Forms.DocumentWriters.PdfDocumentOptions

Creating PDF Documents from OCR Results

All the LEADTOOLS Optical Character Recognition (OCR) engines support outputting the final document as PDF. With OCR, you can convert a scanned TIFF or JPEG file to a searchable PDF. Or extract the text from a raster PDF document. For more information, refer to:

Introduction to Leadtools.Forms.Ocr.
Leadtools.Forms.Ocr.IOcrEngine

Creating Highly Compressed PDF Documents using MRC

The LEADTOOLS PDF Compressor supports saving files through Mixed Raster Content (MRC) technology. Using the LEADTOOLS PDF Compressor with MRC engine, this compressor can be used to break down a page/image into smaller segments, saving each segment using compression appropriate for that segment. This whole process works to provide a PDF file with the highest-possible compression and best-possible quality, as compared to a standard Raster PDF. Refer to the following for more information:

Introduction to Leadtools.PdfCompressor.
Leadtools.PdfCompressor.PdfCompressorEngine