This tutorial shows how to parse the PDF document structure, enumerate through the PDF Objects , and display information about the objects in a C# Windows Console application using the LEADTOOLS SDK.
Overview | |
---|---|
Summary | This tutorial covers how to parse the objects in a PDF document in a C# Windows Console application. |
Completion Time | 30 minutes |
Visual Studio Project | Download tutorial project (3 KB) |
Platform | C# Windows Console Application |
IDE | Visual Studio 2017, 2019 |
Development License | Download LEADTOOLS |
Try it in another language |
|
Get familiar with the basic steps of creating a project by reviewing the Add References and Set a License tutorial, before working on the Parse and Enumerate Objects in a PDF - Console C# tutorial.
PDF Objects in the context of this tutorial include images, bookmarks, internal links, fonts and embedded files in a PDF file.
Start with a copy of the project created in the Add References and Set a License tutorial. If you do not have that project, follow the steps in that tutorial to create it.
The references needed depend upon the purpose of the project. References can be added by one or the other of the following two methods (but not both).
If using NuGet references, this tutorial requires the following NuGet package:
Leadtools.Pdf
If using local DLL references, the following DLLs are needed.
The DLLs are located at <INSTALL_DIR>\LEADTOOLS22\Bin\Dotnet4\x64
:
Leadtools.dll
Leadtools.Pdf.dll
For a complete list of which DLL files are required for your application, refer to Files to be Included in your Application.
The License unlocks the features needed for the project. It must be set before any toolkit function is called. For details, including tutorials for different platforms, refer to Setting a Runtime License.
There are two types of runtime licenses:
Note
Adding LEADTOOLS NuGet and local references and setting a license are covered in more detail in the Add References and Set a License tutorial.
With the project created, the references added, and the license set, coding can begin.
In the Solution Explorer, open Program.cs
. Add the following statements to the using block at the top of Program.cs
:
// Using block at the top
using System;
using System.IO;
using Leadtools;
using Leadtools.Pdf;
Add the following global variable to the class:
private static PDFParseDocumentStructureOptions options;
Create five new methods to the Program
class named ParseImages(PDFDocument document)
, ParseBookmarks(PDFDocument document)
, ParseInternalLinks(PDFDocument document)
, ParseFonts(PDFDocument document)
, and ParseEmbeddedFiles(PDFDocument document)
.
All of these methods populate properties of a PDFDocument
object and display information regarding these properties. These methods will be called inside the Main()
method as shown below.
static void Main(string[] args)
{
try
{
string pdfLocation = @"C:\LEADTOOLS22\Resources\Images\leadtools.pdf";
SetLicense();
PDFDocument pdfDocument = new PDFDocument(pdfLocation);
ParseImages(pdfDocument);
ParseBookmarks(pdfDocument);
ParseInternalLinks(pdfDocument);
ParseFonts(pdfDocument);
ParseEmbeddedFiles(pdfDocument);
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
Console.Write("Press ENTER to end program: ");
Console.Read();
}
To handle the file using MemoryStream
, replace the existing code in the Main()
method with the following:
static void Main(string[] args)
{
try
{
string pdfLocation = @"C:\LEADTOOLS22\Resources\Images\leadtools.pdf";
byte[] data = File.ReadAllBytes(pdfLocation);
using (MemoryStream ms = new MemoryStream(data))
{
SetLicense();
PDFDocument pdfDocument = new PDFDocument(ms);
ParseImages(pdfDocument);
ParseBookmarks(pdfDocument);
ParseInternalLinks(pdfDocument);
ParseFonts(pdfDocument);
ParseEmbeddedFiles(pdfDocument);
}
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
Console.Write("Press ENTER to end program: ");
Console.Read();
}
Add the below code to the ParseImages
method to populate the PDFDocument.Images
property of the provided document and display the object information.
private static void ParseImages(PDFDocument document)
{
options = PDFParseDocumentStructureOptions.Images;
document.ParseDocumentStructure(options);
Console.WriteLine("=======================IMAGES=======================\n");
foreach (PDFImage image in document.Images)
{
Console.WriteLine("###########################################");
Console.WriteLine($"BitsPerComponent: {image.BitsPerComponent} bits");
Console.WriteLine($"BitsPerPixel: {image.BitsPerPixel} bits");
Console.WriteLine($"ColorDevice: {image.ColorDevice}");
Console.WriteLine($"ComponentCount: {image.ComponentCount}");
Console.WriteLine($"Height: {image.Height}");
Console.WriteLine($"ImageType: {image.ImageType}");
Console.WriteLine($"ObjectNumber: {image.ObjectNumber}");
Console.WriteLine($"PageNumber: {image.PageNumber}");
Console.WriteLine($"StreamLength: {image.StreamLength}");
Console.WriteLine($"StreamOffset: {image.StreamOffset}");
Console.WriteLine($"Width: {image.Width}");
Console.WriteLine("###########################################\n");
}
Console.WriteLine("\n====================================================\n");
}
Add the below code to the ParseBookmarks
method to display PDF bookmarks information to the console.
private static void ParseBookmarks(PDFDocument document)
{
options = PDFParseDocumentStructureOptions.Bookmarks;
document.ParseDocumentStructure(options);
Console.WriteLine("=======================BOOKMARKS=======================\n");
foreach (PDFBookmark bookmark in document.Bookmarks)
{
Console.WriteLine("###########################################");
Console.WriteLine($"BookmarkStyle: {bookmark.BookmarkStyle}");
Console.WriteLine($"Level: {bookmark.Level}");
Console.WriteLine($"TargetPageFitType: {bookmark.TargetPageFitType}");
Console.WriteLine($"TargetPageNumber: {bookmark.TargetPageNumber}");
Console.WriteLine($"TargetPosition: {bookmark.TargetPosition}");
Console.WriteLine($"Title: {bookmark.Title}");
Console.WriteLine("###########################################\n");
}
Console.WriteLine("\n====================================================\n");
}
Add the below code to the ParseInternalLinks
method to display PDF internal links information to the console.
private static void ParseInternalLinks(PDFDocument document)
{
options = PDFParseDocumentStructureOptions.InternalLinks;
document.ParseDocumentStructure(options);
Console.WriteLine("=======================INTERNAL LINKS=======================\n");
foreach (PDFInternalLink internalLink in document.InternalLinks)
{
Console.WriteLine("###########################################");
Console.WriteLine($"BorderColor: {internalLink.BorderColor}");
Console.WriteLine($"BorderDashLength: {internalLink.BorderDashLength}");
Console.WriteLine($"BorderWidth: {internalLink.BorderWidth}");
Console.WriteLine($"SourceBounds: {internalLink.SourceBounds}");
Console.WriteLine($"SourcePageNumber: {internalLink.SourcePageNumber}");
Console.WriteLine($"TargetPageFitType: {internalLink.TargetPageFitType}");
Console.WriteLine($"TargetPageNumber: {internalLink.TargetPageNumber}");
Console.WriteLine($"TargetPosition: {internalLink.TargetPosition}");
Console.WriteLine($"TargetZoomPercent: {internalLink.TargetZoomPercent}%");
Console.WriteLine("###########################################\n");
}
Console.WriteLine("\n==========================================================\n");
}
Add the below code to the ParseFonts
method to display PDF fonts information to the console.
private static void ParseFonts(PDFDocument document)
{
options = PDFParseDocumentStructureOptions.Fonts;
document.ParseDocumentStructure(options);
Console.WriteLine("=======================FONTS=======================\n");
foreach (PDFFont font in document.Fonts)
{
Console.WriteLine("###########################################");
Console.WriteLine($"DescendantCID: {font.DescendantCID}");
Console.WriteLine($"EmbeddingType: {font.EmbeddingType}");
Console.WriteLine($"Encoding: {font.Encoding}");
Console.WriteLine($"FaceName: {font.FaceName}");
Console.WriteLine($"FontType: {font.FontType}");
Console.WriteLine("###########################################\n");
}
Console.WriteLine("\n==================================================\n");
}
Add the below code to the ParseEmbeddedFiles
method to populate the PDFDocument.EmbeddedFiles
property of the provided document and display the information regarding
the embedded files to the console.
private static void ParseEmbeddedFiles(PDFDocument document)
{
options = PDFParseDocumentStructureOptions.EmbeddedFiles;
document.ParseDocumentStructure(options);
Console.WriteLine("=======================EMBEDDED FILES=======================\n");
foreach (PDFEmbeddedFile file in document.EmbeddedFiles)
{
Console.WriteLine("###########################################");
Console.WriteLine($"Created: {file.Created}");
Console.WriteLine($"Description: {file.Description}");
Console.WriteLine($"FileName: {file.FileName}");
Console.WriteLine($"FileNumber: {file.FileNumber}");
Console.WriteLine($"FileSize: {file.FileSize}");
Console.WriteLine($"Modified: {file.Modified}");
Console.WriteLine($"ObjectNumber: {file.ObjectNumber}");
Console.WriteLine($"SchemaValues: {file.SchemaValues}");
Console.WriteLine("###########################################\n");
}
Console.WriteLine("\n==========================================================\n");
}
Run the project by pressing F5, or by selecting Debug -> Start Debugging.
If the steps were followed correctly, the application runs and loads the specified PDF document from C:\LEADTOOLS22\Resources\Images\leadtools.pdf
, parses its structure for all PDF Objects , and displays the information to the console.
This tutorial showed how to load a PDF document, parse the document for all PDFDocument
properties, and display the information. Also, we covered how to use the PDFDocument
, PDFParseDocumentStructureOptions
, PDFEmbeddedFile
classes and the PDFImage
, PDFBookmark
, PDFInternalLink
, and PDFFont
structures.