PDFParsePagesOptions Enumeration

Summary

Specifies which options to use when parsing the objects of a PDF document.

Syntax

C++/CLI

Java

Python

[SerializableAttribute()] 
[FlagsAttribute()] 
public enum PDFParsePagesOptions

public final class PDFParsePagesOptions 
    extends java.lang.Enum<PDFParsePagesOptions>

[FlagsAttribute()] 
[SerializableAttribute()] 
public enum class PDFParsePagesOptions

class PDFParsePagesOptions(Enum): 
   None = 0 
   Objects = 1 
   Hyperlinks = 2 
   Fonts = 4 
   IgnoreWhiteSpaces = 8 
   Annotations = 16 
   RTLOriginal = 32 
   RTLFlipBrackets = 64 
   InternalLinks = 128 
   FormFields = 256 
   Signatures = 512 
   All = 791 
   AllIgnoreWhiteSpaces = 799

Members

Value	Member	Description
0x00000000	None	Do not parse any items.
0x00000001	Objects	Parse the objects of the page such as text items (characters), images, and rectangles. Specifying this member will populate the PDFDocumentPage.Objects collection with the objects found in the page.
0x00000002	Hyperlinks	Parse the hyperlinks found in the page. Specifying this member will populate the PDFDocumentPage.Hyperlinks collection with the hyperlinks found in the page.
0x00000008	IgnoreWhiteSpaces	Must be OR'ed with Objects (otherwise it will be ignored). If specified, white space characters such as spaces or tab characters or will not be returned as items in the PDFDocumentPage.Objects collection. Use PDFTextProperties.IsEndOfWord and PDFTextProperties.IsEndOfLine to re-construct the page words and lines as needed.
0x00000004	Fonts
0x00000010	Annotations	Parse the annotations found in the page. Specifying this member will populate the PDFDocumentPage.Annotations collection with any annotations found in the page.
0x00000020	RTLOriginal	Parse characters right to left as they are stored in the page.
0x00000040	RTLFlipBrackets	Flip bracket characters for right to left text when parsing the page.
0x00000080	InternalLinks	Parse all internal links found in the page. This is the equivalent of calling PDFDocument.ParseDocumentStructure with the PDFParseDocumentStructureOptions.InternalLinks option.
0x00000100	FormFields	Parse the form fields found in the page. Specifying this member will populate the PDFDocumentPage.FormFields collection with the PDF form fields found in the page.
0x00000200	Signatures	Parse the digital signatures found in the page. Specifying this member will populate the PDFDocumentPage.Signatures collection with the PDF digital signatures found in the page.
0x00000317	All	Parse all objects with white spaces. This the equivalent of Objects \| Hyperlinks \| Fonts \| Annotations \| FormFields \| Signatures
0x0000031F	AllIgnoreWhiteSpaces	Parse all objects without white spaces. This the equivalent of Objects \| Hyperlinks \| Fonts \| Annotations \| FormFields \| Signatures \| IgnoreWhiteSpaces

Remarks

The PDFParsePagesOptions enumeration is used as the type of the options parameter passed to the PDFDocument.ParsePages method.

When a PDFDocument object is created, the pages of the PDF document are already parsed and populated in the PDFDocument.Pages collection. Each page can contain other objects such as text items (characters), images, rectangles, hyperlinks, annotations, form fields, and digital signatures, as well as the fonts used in these items. These items are not parsed automatically for performance reasons. Instead, call the PDFDocument.ParsePages method with the page ranges you are interested in (or all pages), and the type of items to parse.

Initially, the values of the PDFDocumentPage.Objects, PDFDocumentPage.Hyperlinks, PDFDocumentPage.Annotations, PDFDocumentPage.FormFields, and PDFDocumentPage.Signatures lists of each PDFDocumentPage will be set to null. After the PDFDocument.ParsePages method is called, the corresponding list will be populated with the items found in the page.

Any type of item can be parsed. This is done through the options parameter of type PDFParsePagesOptions passed to PDFDocument.ParsePages. The different options and results are as follows:

If PDFParsePagesOptions.Objects is specified, then the PDFDocumentPage.Objects collection will be populated with a PDFObject object for each object item found in the page. These items can be text (characters), images, or rectangles. If there are no object items found in the page, then the PDFDocumentPage.Objects will be initialized with an empty collection (PDFDocumentPage.Objects.Count will be 0).
If PDFParsePagesOptions.Hyperlinks is specified, then the PDFDocumentPage.Hyperlinks collection will be populated with a PDFHyperlink object for each hyperlink item found in the page. If no hyperlinks are found in the page, PDFDocumentPage.Hyperlinks will be initialized with an empty collection (PDFDocumentPage.Hyperlinks.Count will be 0).
If PDFParsePagesOptions.Annotations is specified, then the PDFDocumentPage.Annotations collection will be populated with a PDFAnnotation object for each annotation item found in the page. If no annotations are found in the page, PDFDocumentPage.Annotations will be initialized with an empty collection (PDFDocumentPage.Annotations.Count will be 0).
If PDFParsePagesOptions.FormFields is specified, then the PDFDocumentPage.FormFields collection will be populated with a PDFFormField object for each form field item found in the page. If no form fields are found in the page, PDFDocumentPage.FormFields will be initialized with an empty collection (PDFDocumentPage.FormFields.Count will be 0).
If PDFParsePagesOptions.Signatures is specified, then the PDFDocumentPage.Signatures collection will be populated with a PDFSignature object for each digital signature item found in the page. If no signatures are found in the page, PDFDocumentPage.Signatures will be initialized with an empty collection (PDFDocumentPage.Signatures.Count will be 0).

White space characters such as spaces or tabs are parsed by default and returned as individual objects. Stop this behavior by OR'ing the PDFParsePagesOptions.IgnoreWhiteSpaces enumeration member with PDFParsePagesOptions.Objects in the options parameter passed to PDFDocument.ParsePages. Note that the words and lines of text in the page can be reconstructed without white characters by using the PDFTextProperties.IsEndOfWord and PDFTextProperties.IsEndOfLine properties. The example of PDFTextProperties shows how to do that.

The values of PDFParsePagesOptions can be OR'ed together.

Note on using PDFParsePagesOptions.Signatures: PDFDocument.ParsePages will automatically call PDFDocument.GetDigitalSignatureSupportStatus to query the status of reading PDF digital signatures. If this method indicates that digital signatures are not available or not supported, then the PDFParsePagesOptions.Signatures is removed and the signatures are not read.

Example

using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Controls; 
using Leadtools.Pdf; 
using Leadtools.Svg; 
using Leadtools.WinForms; 
 
 
public void PDFDocumentParsePagesExample() 
{ 
   string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, @"Leadtools.pdf"); 
   string txtFileName = Path.Combine(LEAD_VARS.ImagesDir, @"LEAD_pdf.txt"); 
 
   // Open the document 
   using (PDFDocument document = new PDFDocument(pdfFileName)) 
   { 
      // Parse everything and for all pages 
      PDFParsePagesOptions options = PDFParsePagesOptions.All; 
      document.ParsePages(options, 1, -1); 
 
      // Save the results to the text file for examining 
      using (StreamWriter writer = File.CreateText(txtFileName)) 
      { 
         foreach (PDFDocumentPage page in document.Pages) 
         { 
            writer.WriteLine("Page {0}", page.PageNumber); 
 
            IList<PDFObject> objects = page.Objects; 
            writer.WriteLine("Objects: {0}", objects.Count); 
            foreach (PDFObject obj in objects) 
            { 
               writer.WriteLine("  ObjectType: {0}", obj.ObjectType.ToString()); 
               writer.WriteLine("  Bounds: {0}, {1}, {2}, {3}", obj.Bounds.Left, obj.Bounds.Top, obj.Bounds.Right, obj.Bounds.Bottom); 
               WriteTextProperties(writer, obj.TextProperties); 
               writer.WriteLine("  Code: {0}", obj.Code); 
               writer.WriteLine("------"); 
            } 
            writer.WriteLine("---------------------"); 
 
            IList<PDFHyperlink> hyperlinks = page.Hyperlinks; 
            writer.WriteLine("Hyperlinks: {0}", hyperlinks.Count); 
            foreach (PDFHyperlink hyperlink in hyperlinks) 
            { 
               writer.WriteLine("  Hyperlink: {0}", hyperlink.Hyperlink); 
               writer.WriteLine("  Bounds: {0}, {1}, {2}, {3}", hyperlink.Bounds.Left, hyperlink.Bounds.Top, hyperlink.Bounds.Right, hyperlink.Bounds.Bottom); 
               WriteTextProperties(writer, hyperlink.TextProperties); 
            } 
            writer.WriteLine("---------------------"); 
         } 
      } 
   } 
} 
 
private static void WriteTextProperties(StreamWriter writer, PDFTextProperties textProperties) 
{ 
   writer.WriteLine("  TextProperties.FontHeight: {0}", textProperties.FontHeight.ToString()); 
   writer.WriteLine("  TextProperties.FontWidth: {0}", textProperties.FontWidth.ToString()); 
   writer.WriteLine("  TextProperties.FontIndex: {0}", textProperties.FontIndex.ToString()); 
   writer.WriteLine("  TextProperties.IsEndOfWord: {0}", textProperties.IsEndOfWord.ToString()); 
   writer.WriteLine("  TextProperties.IsEndOfLine: {0}", textProperties.IsEndOfLine.ToString()); 
   writer.WriteLine("  TextProperties.Color: {0}", textProperties.Color.ToString()); 
} 
 
static class LEAD_VARS 
{ 
   public const string ImagesDir = @"C:\LEADTOOLS22\Resources\Images"; 
}

Requirements

Target Platforms

Reference

Leadtools.Pdf Namespace

Download our FREE evaluation

Help Version 22.0.2023.7.10

Leadtools.Pdf Assembly

Introduction

Getting Started

Namespaces

Leadtools.Pdf Namespace

Assemblies