←Select platform

ParsePages Method

Summary

Parses the objects such as text items (characters), images, rectangles, annotations, form fields, digital signatures, hyperlinks and fonts from one or more PDF pages

Syntax

Java

C++

public void ParsePages(  
   Leadtools.Pdf.PDFParsePagesOptions options, 
   int firstPageNumber, 
   int lastPageNumber 
)

Public Sub ParsePages( _ 
   ByVal options As Leadtools.Pdf.PDFParsePagesOptions, _ 
   ByVal firstPageNumber As Integer, _ 
   ByVal lastPageNumber As Integer _ 
)

public void parsePages(int pdfParsePagesOptions, int firstPageNumber, int lastPageNumber)

public: 
void ParsePages(  
   Leadtools.Pdf.PDFParsePagesOptions options, 
   int firstPageNumber, 
   int lastPageNumber 
)

Parameters

options
One or more PDFParsePagesOptions enumeration member that specify the types of objects to parse.

firstPageNumber
1-based index of the first page number to parse. Must be greater than or equal to 1 and less than or equal to the number of pages in the document.

lastPageNumber
1-based index of the last page number to parse. Must be greater than or equal to firstPageNumber and less than or equal to the number of pages in the document. Use the special value of -1 to represent the last page in the document.

Remarks

When a PDFDocument object is created, the pages of the PDF document are already parsed and populated in the PDFDocument.Pages collection. Each page may contain other objects such as text items (characters), images, rectangles and hyperlinks, annotations, form fields, digital signatures as well as the fonts used in these items. These items are not parsed automatically for performance reasons. Instead, you must call the ParsePages method with the page ranges you are interested in (or all pages) and type of items to parse.

Initially, the values of the PDFDocumentPage.Fonts, PDFDocumentPage.Objects, PDFDocumentPage.Hyperlinks, PDFDocumentPage.Annotations, PDFDocumentPage.FormFields and PDFDocumentPage.Signatures lists of each PDFDocumentPage will be set to null. When the ParsePages method is called, the corresponding list will be populated with the items found in the page.

You can parse any type of item you are interested in, this is done through the

options
One or more PDFParsePagesOptions enumeration member that specify the types of objects to parse.

If PDFParsePagesOptions.Objects is specified, then the PDFDocumentPage.Objects collection will be populated with a PDFObject object for each object item found in the page. These items can be text (characters), images or rectangles. If there are no object items found in the page, then the PDFDocumentPage.Objects will be initialized with an empty collection (PDFDocumentPage.Objects.Count will be 0).
If PDFParsePagesOptions.Hyperlinks is specified, then the PDFDocumentPage.Hyperlinks collection will be populated with a PDFHyperlink object for each hyperlink item found in the page. If no hyperlinks are found in the page, PDFDocumentPage.Hyperlinks will be initialized with an empty collection (PDFDocumentPage.Hyperlinks.Count will be 0).
If PDFParsePagesOptions.Annotations is specified, then the PDFDocumentPage.Annotations collection will be populated with a PDFAnnotation object for each annotation item found in the page. If no annotations are found in the page, PDFDocumentPage.Annotations will be initialized with an empty collection (PDFDocumentPage.Annotations.Count will be 0).
If PDFParsePagesOptions.Fonts is specified, then the PDFDocumentPage.Fonts collection will be populated with a PDFFont object for each font item found in the page. If no fonts are found in the page, PDFDocumentPage.Fonts will be initialized with an empty collection (PDFDocumentPage.Fonts.Count will be 0).
If PDFParsePagesOptions.FormFields is specified, then the PDFDocumentPage.FormFields collection will be populated with a PDFFormField object for each form field item found in the page. If no form fields are found in the page, PDFDocumentPage.FormFields will be initialized with an empty collection (PDFDocumentPage.FormFields.Count will be 0).
If PDFParsePagesOptions.Signatures is specified, then the PDFDocumentPage.Signatures collection will be populated with a PDFSignature object for each digital signature item found in the page. If no signatures are found in the page, PDFDocumentPage.Signatures will be initialized with an empty collection (PDFDocumentPage.Signatures.Count will be 0).

A white space character such as a space or a tab are parsed by default and returned as individual objects. This behavior can be stopped by OR'ing the PDFParsePagesOptions.IgnoreWhiteSpaces enumeration member with PDFParsePagesOptions.Objects in the

options
One or more PDFParsePagesOptions enumeration member that specify the types of objects to parse.

The values of PDFParsePagesOptions can be OR'ed together.

Example

This example will parse all the objects from a PDF document. Refer to PDFObject for an example on how to draw the objects of a PDF page to an image and PDFTextProperties to show how to write the text of a PDF page to an external file.

Imports Leadtools 
Imports Leadtools.Codecs 
Imports Leadtools.Controls 
Imports Leadtools.Pdf 
Imports Leadtools.Svg 
Imports Leadtools.WinForms 
 
Public Sub PDFDocumentParsePagesExample() 
  Dim pdfFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "Leadtools.pdf") 
  Dim txtFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "LEAD_pdf.txt") 
  ' Open the document 
  Using document As PDFDocument = New PDFDocument(pdfFileName) 
    ' Parse everything and for all pages 
    Dim options As PDFParsePagesOptions = PDFParsePagesOptions.All 
    document.ParsePages(options, 1, -1) 
 
    ' Save the results to the text file for examining 
    Using writer As StreamWriter = File.CreateText(txtFileName) 
       For Each page As PDFDocumentPage In document.Pages 
         writer.WriteLine("Page {0}", page.PageNumber) 
 
         Dim fonts As IList(Of PDFFont) = page.Fonts 
         ' Note, no need to check if fonts is null since we passed .All 
         ' This will either get the fonts or an empty list. Same for all 
         ' the other objects 
         writer.WriteLine("Fonts: {0}", fonts.Count) 
         For Each font As PDFFont In fonts 
           writer.WriteLine("  FaceName: {0}", font.FaceName) 
           writer.WriteLine("  FontStyle: {0}", font.FontStyle.ToString()) 
           writer.WriteLine("------") 
         Next font 
         writer.WriteLine("---------------------") 
 
         Dim objects As IList(Of PDFObject) = page.Objects 
         writer.WriteLine("Objects: {0}", objects.Count) 
         For Each obj As PDFObject In objects 
           writer.WriteLine("  ObjectType: {0}", obj.ObjectType.ToString()) 
           writer.WriteLine("  Bounds: {0}, {1}, {2}, {3}", obj.Bounds.Left, obj.Bounds.Top, obj.Bounds.Right, obj.Bounds.Bottom) 
           WriteTextProperties(writer, obj.TextProperties) 
           writer.WriteLine("  Code: {0}", obj.Code) 
           writer.WriteLine("------") 
         Next obj 
         writer.WriteLine("---------------------") 
 
         Dim hyperlinks As IList(Of PDFHyperlink) = page.Hyperlinks 
         writer.WriteLine("Hyperlinks: {0}", hyperlinks.Count) 
         For Each hyperlink As PDFHyperlink In hyperlinks 
           writer.WriteLine("  Hyperlink: {0}", hyperlink.Hyperlink) 
           writer.WriteLine("  Bounds: {0}, {1}, {2}, {3}", hyperlink.Bounds.Left, hyperlink.Bounds.Top, hyperlink.Bounds.Right, hyperlink.Bounds.Bottom) 
           WriteTextProperties(writer, hyperlink.TextProperties) 
         Next hyperlink 
         writer.WriteLine("---------------------") 
       Next page 
    End Using 
  End Using 
End Sub 
 
Private Shared Sub WriteTextProperties(ByVal writer As StreamWriter, ByVal textProperties As PDFTextProperties) 
  writer.WriteLine("  TextProperties.FontHeight: {0}", textProperties.FontHeight.ToString()) 
  writer.WriteLine("  TextProperties.FontWidth: {0}", textProperties.FontWidth.ToString()) 
  writer.WriteLine("  TextProperties.FontIndex: {0}", textProperties.FontIndex.ToString()) 
  writer.WriteLine("  TextProperties.IsEndOfWord: {0}", textProperties.IsEndOfWord.ToString()) 
  writer.WriteLine("  TextProperties.IsEndOfLine: {0}", textProperties.IsEndOfLine.ToString()) 
  writer.WriteLine("  TextProperties.Color: {0}", textProperties.Color.ToString()) 
End Sub 
 
Public NotInheritable Class LEAD_VARS 
Public Const ImagesDir As String = "C:\Users\Public\Documents\LEADTOOLS Images" 
End Class

using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Controls; 
using Leadtools.Pdf; 
using Leadtools.Svg; 
using Leadtools.WinForms; 
 
public void PDFDocumentParsePagesExample() 
{ 
   string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, @"Leadtools.pdf"); 
   string txtFileName = Path.Combine(LEAD_VARS.ImagesDir, @"LEAD_pdf.txt"); 
   // Open the document 
   using(PDFDocument document = new PDFDocument(pdfFileName)) 
   { 
      // Parse everything and for all pages 
      PDFParsePagesOptions options = PDFParsePagesOptions.All; 
      document.ParsePages(options, 1, -1); 
 
      // Save the results to the text file for examining 
      using(StreamWriter writer = File.CreateText(txtFileName)) 
      { 
         foreach(PDFDocumentPage page in document.Pages) 
         { 
            writer.WriteLine("Page {0}", page.PageNumber); 
 
            IList<PDFFont> fonts = page.Fonts; 
            // Note, no need to check if fonts is null since we passed .All 
            // This will either get the fonts or an empty list. Same for all 
            // the other objects 
            writer.WriteLine("Fonts: {0}", fonts.Count); 
            foreach(PDFFont font in fonts) 
            { 
               writer.WriteLine("  FaceName: {0}", font.FaceName); 
               writer.WriteLine("  FontStyle: {0}", font.FontStyle.ToString()); 
               writer.WriteLine("------"); 
            } 
            writer.WriteLine("---------------------"); 
 
            IList<PDFObject> objects = page.Objects; 
            writer.WriteLine("Objects: {0}", objects.Count); 
            foreach(PDFObject obj in objects) 
            { 
               writer.WriteLine("  ObjectType: {0}", obj.ObjectType.ToString()); 
               writer.WriteLine("  Bounds: {0}, {1}, {2}, {3}", obj.Bounds.Left, obj.Bounds.Top, obj.Bounds.Right, obj.Bounds.Bottom); 
               WriteTextProperties(writer, obj.TextProperties); 
               writer.WriteLine("  Code: {0}", obj.Code); 
               writer.WriteLine("------"); 
            } 
            writer.WriteLine("---------------------"); 
 
            IList<PDFHyperlink> hyperlinks = page.Hyperlinks; 
            writer.WriteLine("Hyperlinks: {0}", hyperlinks.Count); 
            foreach(PDFHyperlink hyperlink in hyperlinks) 
            { 
               writer.WriteLine("  Hyperlink: {0}", hyperlink.Hyperlink); 
               writer.WriteLine("  Bounds: {0}, {1}, {2}, {3}", hyperlink.Bounds.Left, hyperlink.Bounds.Top, hyperlink.Bounds.Right, hyperlink.Bounds.Bottom); 
               WriteTextProperties(writer, hyperlink.TextProperties); 
            } 
            writer.WriteLine("---------------------"); 
         } 
      } 
   } 
} 
 
private static void WriteTextProperties(StreamWriter writer, PDFTextProperties textProperties) 
{ 
   writer.WriteLine("  TextProperties.FontHeight: {0}", textProperties.FontHeight.ToString()); 
   writer.WriteLine("  TextProperties.FontWidth: {0}", textProperties.FontWidth.ToString()); 
   writer.WriteLine("  TextProperties.FontIndex: {0}", textProperties.FontIndex.ToString()); 
   writer.WriteLine("  TextProperties.IsEndOfWord: {0}", textProperties.IsEndOfWord.ToString()); 
   writer.WriteLine("  TextProperties.IsEndOfLine: {0}", textProperties.IsEndOfLine.ToString()); 
   writer.WriteLine("  TextProperties.Color: {0}", textProperties.Color.ToString()); 
} 
 
static class LEAD_VARS 
{ 
public const string ImagesDir = @"C:\Users\Public\Documents\LEADTOOLS Images"; 
}