←Select platform

ParsePages Method

Summary

Parses objects such as text items (characters), images, rectangles, annotations, form fields, digital signatures, hyperlinks and fonts from one or more PDF pages.

Syntax
C#
C++/CLI
Java
Python
public void ParsePages( 
   PDFParsePagesOptions options, 
   int firstPageNumber, 
   int lastPageNumber 
) 
public void parsePages( 
   int options, 
   int firstPageNumber, 
   int lastPageNumber 
); 
public: 
void ParsePages(  
   PDFParsePagesOptions options, 
   int firstPageNumber, 
   int lastPageNumber 
)  
def ParsePages(self,options,firstPageNumber,lastPageNumber): 

Parameters

options
One or more PDFParsePagesOptions enumeration member that specify the types of objects to parse.

firstPageNumber
1-based index of the first page number to parse. Must be greater than or equal to 1 and less than or equal to the number of pages in the document.

lastPageNumber
1-based index of the last page number to parse. Must be greater than or equal to  firstPageNumber and less than or equal to the number of pages in the document. Use the special value of -1 to represent the last page in the document.

Remarks

When a PDFDocument object is created, the pages of the PDF document are already parsed and populated in the PDFDocument.Pages collection. Each page can contain other objects such as text items (characters), images, rectangles and hyperlinks, annotations, form fields, and digital signatures as well as the fonts used in these items. These items are not parsed automatically for performance reasons. Instead, you must call the ParsePages method with the page ranges you are interested in (or all pages) and type of items to parse.

Initially, the values of the PDFDocumentPage.Objects, PDFDocumentPage.Hyperlinks, PDFDocumentPage.Annotations, PDFDocumentPage.FormFields and PDFDocumentPage.Signatures lists of each PDFDocumentPage will be set to null. When the ParsePages method is called, the corresponding list will be populated with the items found on the page.

Any type of item can be parsed using the options parameter of type PDFParsePagesOptions passed to ParsePages, as follows:

A white space character such as a space or a tab are parsed by default and returned as individual objects. This behavior can be stopped by OR'ing the PDFParsePagesOptions.IgnoreWhiteSpaces enumeration member with PDFParsePagesOptions.Objects in the options parameter passed to PDFDocument.ParsePages. Note that the words and lines of text on the page can be re-constructed without white characters by using the PDFTextProperties.IsEndOfWord and PDFTextProperties.IsEndOfLine properties. The example of PDFTextProperties shows how to do that.

The values of PDFParsePagesOptions can be OR'ed together.

Example

This example will parse all the objects from a PDF document. Refer to PDFObject for an example on how to draw the objects of a PDF page to an image and PDFTextProperties to show how to write the text of a PDF page to an external file.

C#
Java
using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Controls; 
using Leadtools.Pdf; 
using Leadtools.Svg; 
using Leadtools.WinForms; 
 
 
public void PDFDocumentParsePagesExample() 
{ 
   string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, @"Leadtools.pdf"); 
   string txtFileName = Path.Combine(LEAD_VARS.ImagesDir, @"LEAD_pdf.txt"); 
 
   // Open the document 
   using (PDFDocument document = new PDFDocument(pdfFileName)) 
   { 
      // Parse everything and for all pages 
      PDFParsePagesOptions options = PDFParsePagesOptions.All; 
      document.ParsePages(options, 1, -1); 
 
      // Save the results to the text file for examining 
      using (StreamWriter writer = File.CreateText(txtFileName)) 
      { 
         foreach (PDFDocumentPage page in document.Pages) 
         { 
            writer.WriteLine("Page {0}", page.PageNumber); 
 
            IList<PDFObject> objects = page.Objects; 
            writer.WriteLine("Objects: {0}", objects.Count); 
            foreach (PDFObject obj in objects) 
            { 
               writer.WriteLine("  ObjectType: {0}", obj.ObjectType.ToString()); 
               writer.WriteLine("  Bounds: {0}, {1}, {2}, {3}", obj.Bounds.Left, obj.Bounds.Top, obj.Bounds.Right, obj.Bounds.Bottom); 
               WriteTextProperties(writer, obj.TextProperties); 
               writer.WriteLine("  Code: {0}", obj.Code); 
               writer.WriteLine("------"); 
            } 
            writer.WriteLine("---------------------"); 
 
            IList<PDFHyperlink> hyperlinks = page.Hyperlinks; 
            writer.WriteLine("Hyperlinks: {0}", hyperlinks.Count); 
            foreach (PDFHyperlink hyperlink in hyperlinks) 
            { 
               writer.WriteLine("  Hyperlink: {0}", hyperlink.Hyperlink); 
               writer.WriteLine("  Bounds: {0}, {1}, {2}, {3}", hyperlink.Bounds.Left, hyperlink.Bounds.Top, hyperlink.Bounds.Right, hyperlink.Bounds.Bottom); 
               WriteTextProperties(writer, hyperlink.TextProperties); 
            } 
            writer.WriteLine("---------------------"); 
         } 
      } 
   } 
} 
 
private static void WriteTextProperties(StreamWriter writer, PDFTextProperties textProperties) 
{ 
   writer.WriteLine("  TextProperties.FontHeight: {0}", textProperties.FontHeight.ToString()); 
   writer.WriteLine("  TextProperties.FontWidth: {0}", textProperties.FontWidth.ToString()); 
   writer.WriteLine("  TextProperties.FontIndex: {0}", textProperties.FontIndex.ToString()); 
   writer.WriteLine("  TextProperties.IsEndOfWord: {0}", textProperties.IsEndOfWord.ToString()); 
   writer.WriteLine("  TextProperties.IsEndOfLine: {0}", textProperties.IsEndOfLine.ToString()); 
   writer.WriteLine("  TextProperties.Color: {0}", textProperties.Color.ToString()); 
} 
 
static class LEAD_VARS 
{ 
   public const string ImagesDir = @"C:\LEADTOOLS23\Resources\Images"; 
} 
 
import java.io.BufferedWriter; 
import java.io.Console; 
import java.io.File; 
import java.io.FileWriter; 
import java.io.IOException; 
import java.io.OutputStream; 
import java.io.OutputStreamWriter; 
import java.nio.Buffer; 
import java.nio.file.Files; 
import java.nio.file.Path; 
import java.nio.file.Paths; 
import java.nio.file.StandardOpenOption; 
import java.sql.Date; 
import java.text.SimpleDateFormat; 
import java.time.LocalDateTime; 
import java.util.ArrayList; 
import java.util.List; 
 
import javax.xml.validation.Schema; 
 
import org.apache.lucene.store.Directory; 
import org.junit.*; 
import org.junit.runner.JUnitCore; 
import org.junit.runner.Result; 
import org.junit.runner.notification.Failure; 
import static org.junit.Assert.*; 
 
import leadtools.*; 
import leadtools.barcode.*; 
import leadtools.codecs.*; 
import leadtools.pdf.*; 
import leadtools.svg.*; 
 
 
public void pdfDocumentParsePagesExample() { 
   String LEAD_VARS_ImagesDir = "C:\\LEADTOOLS23\\Resources\\Images"; 
   String pdfFileName = combine(LEAD_VARS_ImagesDir, "Leadtools.pdf"); 
   String txtFileName = combine(LEAD_VARS_ImagesDir, "LEAD_pdf.txt"); 
 
   PDFDocument document = new PDFDocument(pdfFileName); 
   PDFParsePagesOptions options = PDFParsePagesOptions.ALL; 
   document.parsePages(options.getValue(), 1, -1); 
 
   try (BufferedWriter writer = new BufferedWriter(new FileWriter(txtFileName))) { 
      for (PDFDocumentPage page : document.getPages()) { 
         writer.write("Page " + page.getPageNumber()); 
         writer.newLine(); 
 
         List<PDFObject> objects = page.getObjects(); 
         writer.write("Objects: " + objects.size()); 
         writer.newLine(); 
 
         for (PDFObject obj : objects) { 
            writer.write("  ObjectType: " + obj.getObjectType().toString()); 
            writer.newLine(); 
            writer.write("  Bounds: " + obj.getBounds().getLeft() + ", " + obj.getBounds().getTop() + ", " 
                  + obj.getBounds().getRight() + ", " + obj.getBounds().getBottom()); 
            writer.newLine(); 
            writeTextProperties(writer, obj.getTextProperties()); 
            writer.write("  Code: " + obj.getCode()); 
            writer.newLine(); 
            writer.write("------"); 
            writer.newLine(); 
         } 
 
         writer.write("---------------------"); 
         writer.newLine(); 
 
         List<PDFHyperlink> hyperlinks = page.getHyperlinks(); 
         writer.write("Hyperlinks: " + hyperlinks.size()); 
         writer.newLine(); 
 
         for (PDFHyperlink hyperlink : hyperlinks) { 
            writer.write("  Hyperlink: " + hyperlink.getHyperlink()); 
            writer.newLine(); 
            writer.write("  Bounds: " + hyperlink.getBounds().getLeft() + ", " + hyperlink.getBounds().getTop() 
                  + ", " + hyperlink.getBounds().getRight() + ", " + hyperlink.getBounds().getBottom()); 
            writer.newLine(); 
            writeTextProperties(writer, hyperlink.getTextProperties()); 
         } 
 
         writer.write("---------------------"); 
         writer.newLine(); 
      } 
   } catch (Exception e) { 
      System.out.println(e.toString()); 
   } 
 
   assertTrue(new File(txtFileName).exists()); 
} 
 
public static void writeTextProperties(BufferedWriter writer, PDFTextProperties textProperties) throws IOException { 
   writer.write("  TextProperties.FontHeight: " + textProperties.getFontHeight()); 
   writer.newLine(); 
   writer.write("  TextProperties.FontWidth: " + textProperties.getFontWidth()); 
   writer.newLine(); 
   writer.write("  TextProperties.FontIndex: " + textProperties.getFontIndex()); 
   writer.newLine(); 
   writer.write("  TextProperties.IsEndOfWord: " + textProperties.isEndOfWord()); 
   writer.newLine(); 
   writer.write("  TextProperties.IsEndOfLine: " + textProperties.isEndOfLine()); 
   writer.newLine(); 
   writer.write("  TextProperties.Color: " + textProperties.getColor()); 
   writer.newLine(); 
} 
Requirements

Target Platforms

Help Version 23.0.2024.2.29
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2024 LEAD Technologies, Inc. All Rights Reserved.

Leadtools.Pdf Assembly
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2023 LEAD Technologies, Inc. All Rights Reserved.