This tutorial shows three different techniques to individually save each page of a multipage PDF in a Java application using the LEADTOOLS SDK.
Overview | |
---|---|
Summary | This tutorial covers how to split multipage PDF files in a Java Console application. |
Completion Time | 30 minutes |
Eclipse Project | Download tutorial project (3 KB) |
Platform | Java Application |
IDE | Eclipse |
Development License | Download LEADTOOLS |
Try it in another language |
|
Get familiar with the basic steps of creating a project by reviewing the Add References and Set a License tutorial, before working on the Split a PDF File into Multiple Files - Java tutorial.
Start with a copy of the project created in the Add References and Set a License tutorial. If that project is unavailable, follow the steps in that tutorial to create it.
The references needed depend upon the purpose of the project. References can be added by local .jar
files located at <INSTALL_DIR>\LEADTOOLS23\Bin\Java
.
For this project, the following references are needed:
leadtools.annotations.engine.jar
leadtools.caching.jar
leadtools.codecs.jar
leadtools.document.converter.jar
leadtools.document.jar
leadtools.document.pdf.jar
leadtools.document.writer.jar
leadtools.imageprocessing.core.jar
leadtools.jar
leadtools.ocr.jar
leadtools.pdf.jar
leadtools.svg.jar
This tutorial uses LEADTOOLS Codec library support. For a complete list of which JAR files are required for your application, refer to Files to be Included with your Java Application
The License unlocks the features needed for the project. It must be set before any toolkit function is called. For details including tutorials for different platforms, refer to Setting a Runtime License.
There are two types of runtime licenses:
Note: Adding LEADTOOLS references and setting a license are covered in more detail in the Add References and Set a License tutorial.
With the project created, the references added, and the license set, coding can begin.
Open the Main.java
class in the Project Explorer. Add the following statements to the import
block at the top.
import java.io.IOException;
import java.nio.file.*;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import leadtools.*;
import leadtools.codecs.*;
import leadtools.pdf.*;
import leadtools.document.*;
import leadtools.document.converter.*;
import leadtools.document.writer.*;
import leadtools.ocr.*;
Add the code below to the run()
method to create the split files directory and call the methods created in the sections below.
private void run(String[] args) {
try {
Platform.setLibPath("C:\\LEADTOOLS23\\Bin\\CDLL\\x64");
Platform.loadLibrary(LTLibrary.LEADTOOLS);
Platform.loadLibrary(LTLibrary.CODECS);
Platform.loadLibrary(LTLibrary.DOCUMENT_WRITER);
Platform.loadLibrary(LTLibrary.PDF);
Platform.loadLibrary(LTLibrary.OCR);
SetLicense();
String multipageFile = "C:\\LEADTOOLS23\\Resources\\Images\\leadtools.pdf";
String _splitDir = "C:\\LEADTOOLS23\\Resources\\Images\\Split PDFs";
if(!Files.exists(Paths.get(_splitDir)))
Files.createDirectory(Paths.get(_splitDir));
splitUsingRasterCodecs(multipageFile, _splitDir);
splitUsingPDFFile(multipageFile, _splitDir);
splitUsingLEADDocument(multipageFile, _splitDir);
} catch (Exception ex) {
System.err.println(ex.getMessage());
ex.printStackTrace();
}
}
Three different techniques for splitting the pages of a PDF file will be discussed below, each has its own advantages.
In this approach, each page is loaded as a raster (bitmap) image, then saved as a raster PDF file. This is done using the RasterCodecs class.
The main advantage of this approach is code simplicity. It only takes a few lines of code, and the exact same code can be used for other multipage formats such as TIFF or GIF.
Create a new method in the _Main
class named splitUsingRasterCodecs(String inputFile, String _directory)
. This method will be called inside the run()
method, as shown above.
void splitUsingRasterCodecs(String inputFile, String _directory) {
RasterCodecs codecs = new RasterCodecs();
codecs.getOptions().getPdf().setInitialPath("C:\\LEADTOOLS23\\Bin\\CDLL\\x64");
int totalPages = codecs.getTotalPages(inputFile);
System.out.println("SplitUsingRasterCodecs..\nTotal pages:" + totalPages + " Splitting pages:");
for (int page = 1; page <= totalPages; page++) {
System.out.println(page + "..");
String outputFilename = Paths.get(inputFile).toFile().getName();
if(outputFilename.lastIndexOf('.') != -1)
outputFilename = outputFilename.substring(0, outputFilename.lastIndexOf('.'));
outputFilename = outputFilename + "_codecs_page" + page + ".pdf";
String outputFile = _directory + "\\" + outputFilename;
RasterImage image = codecs.load(inputFile, page);
codecs.save(image, outputFile, RasterImageFormat.RAS_PDF_LZW, 0);
image.dispose();
}
codecs.dispose();
}
In this approach, the PDFFile class is used, which is a dedicated class for the PDF format. This means the code cannot be used with other document or image formats.
The main advantage of this approach is that it preserves the contents of PDF pages since it does not convert searchable text to raster images. Additionally, in many cases it does not cause re-encoding of images that exist in the original PDF file, which improves performance and maintains image quality. The code is also very simple.
Create a new method in the _Main
class named splitUsingPDFFile(String inputFile, String _directory)
. This method will be called inside the run()
method, as shown above.
void splitUsingPDFFile(String inputFile, String _directory) {
PDFFile pdfFile = new PDFFile(inputFile);
int totalPages = pdfFile.getPageCount();
System.out.println("SplitUsingPDFFile..\nTotal pages:" + totalPages + " Splitting pages:");
for (int page = 1; page <= totalPages; page++) {
System.out.println(page + "..");
String outputFilename = Paths.get(inputFile).toFile().getName();
if(outputFilename.lastIndexOf('.') != -1)
outputFilename = outputFilename.substring(0, outputFilename.lastIndexOf('.'));
outputFilename = outputFilename + "_pdffile_page" + page + ".pdf";
String outputFile = _directory + "\\" + outputFilename;
pdfFile.extractPages(page, page, outputFile);
}
}
This approach is the most advanced of the three and it utilizes the LEADDocument and DocumentConverter classes.
Since these classes are versatile for use with different formats, similar code can be used for splitting many types of document files and outputting to different document and raster formats. For example, in the code below, simply changing DocumentFormat.PDF
to become DocumentFormat.DOCX
will split the file into Microsoft Word output pages instead of PDF pages. Additionally, these powerful classes produce optimized output files.
Because document conversion jobs are asynchronous, a Java ExecutorService
is required to be configured and assigned to the RasterDefaults
class.
Define an ExecutorService field in the _Main
class and add the code below in the run()
:
private ExecutorService service;
private void run(String[] args) {
try {
Platform.setLibPath("C:\\LEADTOOLS23\\Bin\\CDLL\\x64");
Platform.loadLibrary(LTLibrary.LEADTOOLS);
Platform.loadLibrary(LTLibrary.CODECS);
Platform.loadLibrary(LTLibrary.DOCUMENT_WRITER);
Platform.loadLibrary(LTLibrary.PDF);
Platform.loadLibrary(LTLibrary.OCR);
SetLicense();
// Set ExecutorService in RasterDefaults for Document Converter Jobs
service = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
RasterDefaults.setExecutorService(service);
String multipageFile = "C:\\LEADTOOLS23\\Resources\\Images\\leadtools.pdf";
String _splitDir = "C:\\LEADTOOLS23\\Resources\\Images\\Split PDFs";
if(!Files.exists(Paths.get(_splitDir)))
Files.createDirectory(Paths.get(_splitDir));
splitUsingRasterCodecs(multipageFile, _splitDir);
splitUsingPDFFile(multipageFile, _splitDir);
splitUsingLEADDocument(multipageFile, _splitDir);
}
catch(Exception ex) {
System.err.println(ex.getMessage());
ex.printStackTrace();
}
}
Create a new method in the _Main
class named splitUsingLEADDocument(string inputFile, string _directory)
. This method will be called inside the run()
method, as shown above.
void splitUsingLEADDocument(String inputFile, String _directory) {
DocumentWriter documentWriter = new DocumentWriter();
// Optional: use documentWriter.getOptions() and documentWriter.setOptions() to modify PDF options
var createOptions = new CreateDocumentOptions();
LEADDocument inputDocument = DocumentFactory.loadFromFile(inputFile, new LoadDocumentOptions());
OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD);
ocrEngine.startup(null, null, null, "C:\\LEADTOOLS23\\Bin\\Common\\OcrLEADRuntime");
System.out.println("SplitUsingLEADDocument..\nTotal pages:" + inputDocument.getPages().getOriginalPageCount() + " Splitting pages:");
for(var inputPage : inputDocument.getPages()) {
LEADDocument pageDocument = DocumentFactory.create(createOptions);
pageDocument.setAutoDisposeDocuments(true);
pageDocument.setName("VirtualPage");
pageDocument.getPages().add(inputPage);
DocumentConverter docConverter = new DocumentConverter();
docConverter.setDocumentWriterInstance(documentWriter);
int page = inputDocument.getPages().indexOf(inputPage) + 1; // (+ 1) since index is zero-based
System.out.println(page + "..");
var jobData = new DocumentConverterJobData();
jobData.setDocument(inputPage);
String outputFilename = Paths.get(inputFile).toFile().getName();
if(outputFilename.lastIndexOf('.') != -1)
outputFilename = outputFilename.substring(0, outputFilename.lastIndexOf('.'));
outputFilename = outputFilename + "_LeadDoc_page" + page + ".pdf";
String outputFile = _directory + "\\" + outputFilename;
jobData.setOutputDocumentFileName(outputFile);
jobData.setDocumentFormat(DocumentFormat.PDF);
var job = docConverter.getJobs().createJob(jobData);
docConverter.getJobs().runJob(job);
}
System.out.println();
ocrEngine.shutdown();
}
To handle the files using I/O streams, add a statement to the import
block at the top to import the java.io.InputStream
object.
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.*;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import leadtools.*;
import leadtools.codecs.*;
import leadtools.pdf.*;
import leadtools.document.*;
import leadtools.document.converter.*;
import leadtools.document.writer.*;
import leadtools.ocr.*;
Replace the existing code in the run()
method with the following:
private void run(String[] args) {
try {
Platform.setLibPath("C:\\LEADTOOLS23\\Bin\\CDLL\\x64");
Platform.loadLibrary(LTLibrary.LEADTOOLS);
Platform.loadLibrary(LTLibrary.CODECS);
Platform.loadLibrary(LTLibrary.DOCUMENT_WRITER);
Platform.loadLibrary(LTLibrary.PDF);
Platform.loadLibrary(LTLibrary.OCR);
SetLicense();
// Set ExecutorService in RasterDefaults for Document Converter Jobs
service = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
RasterDefaults.setExecutorService(service);
String multipageFile = "C:\\LEADTOOLS23\\Resources\\Images\\leadtools.pdf";
InputStream multipageInputStream = Files.newInputStream(Paths.get(multipageFile));
LeadDynamicStream multipageLeadDynamicStream = new LeadDynamicStream(multipageInputStream, false);
splitUsingRasterCodecs(multipageLeadDynamicStream);
splitUsingLEADDocument(multipageLeadDynamicStream);
} catch (Exception ex) {
System.err.println(ex.getMessage());
ex.printStackTrace();
}
}
Add the splitUsingRasterCodecs
method overload which handles an ILeadStream
object.
void splitUsingRasterCodecs(ILeadStream inputStream) {
RasterCodecs codecs = new RasterCodecs();
codecs.getOptions().getPdf().setInitialPath("C:\\LEADTOOLS23\\Bin\\CDLL\\x64");
int totalPages = codecs.getTotalPages(inputStream);
System.out.println("SplitUsingRasterCodecs..\nTotal pages:" + totalPages + " Splitting pages:");
for (int page = 1; page <= 5; page++) {
System.out.println(page + "..");
RasterImage image = codecs.load(inputStream, page);
LeadDynamicStream leadDynamicStream = new LeadDynamicStream();
codecs.save(image, leadDynamicStream, RasterImageFormat.RAS_PDF_LZW, 0);
// Use output Stream containing the split file before it is closed and freed for the next page
leadDynamicStream.close();
leadDynamicStream.dispose();
image.dispose();
}
System.out.println();
codecs.dispose();
}
Add the splitUsingLEADDocument
method overload which handles an ILeadStream
object.
void splitUsingLEADDocument(ILeadStream inputStream) {
DocumentWriter documentWriter = new DocumentWriter();
// Optional: use documentWriter.getOptions() and documentWriter.setOptions() to modify PDF options
var createOptions = new CreateDocumentOptions();
LEADDocument inputDocument = DocumentFactory.loadFromStream(inputStream, new LoadDocumentOptions());
OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD);
ocrEngine.startup(null, null, null, "C:\\LEADTOOLS23\\Bin\\Common\\OcrLEADRuntime");
System.out.println("SplitUsingLEADDocument..\nTotal pages:" + inputDocument.getPages().getOriginalPageCount()
+ " Splitting pages:");
for (var inputPage : inputDocument.getPages()) {
LEADDocument pageDocument = DocumentFactory.create(createOptions);
pageDocument.setAutoDisposeDocuments(true);
pageDocument.setName("VirtualPage");
pageDocument.getPages().add(inputPage);
DocumentConverter docConverter = new DocumentConverter();
docConverter.setDocumentWriterInstance(documentWriter);
int page = inputDocument.getPages().indexOf(inputPage) + 1; // (+ 1) since index is zero-based
System.out.println(page + "..");
var jobData = new DocumentConverterJobData();
jobData.setDocument(pageDocument);
jobData.setOutputDocumentStream(new LeadDynamicStream());
jobData.setDocumentFormat(DocumentFormat.PDF);
jobData.setJobName("LeadDoc_page" + page);
var job = docConverter.getJobs().createJob(jobData);
Jobs_JobCompleted jobsCompleted = new Jobs_JobCompleted();
docConverter.getJobs().addJobCompletedListener(jobsCompleted);
docConverter.getJobs().runJob(job);
}
}
Add the Jobs_JobCompleted
event listener class that will handle the asynchronous jobs from the Document Converter above to access the output document stream.
class Jobs_JobCompleted implements DocumentConverterJobEventListener {
public void onEvent(DocumentConverterJobEvent e) {
if (e.getOperation() == DocumentConverterJobOperation.COMPLETED) {
LeadDynamicStream outputDocumentStream = (LeadDynamicStream) e.getJob().getJobData().getOutputDocumentStream();
// Use Output Document Stream containing the split file before it is closed and freed for the next page
outputDocumentStream.close();
outputDocumentStream.dispose();
}
}
}
Run the project by pressing Ctrl + F11, or by selecting Run -> Run.
If the steps were followed correctly, the application runs and creates new files. Each page of leadtools.pdf
should be created as a separate PDF file in three different ways, with the page number appended to the name.
This tutorial showed how to add the necessary references to load all the pages of a PDF file and split them into separate documents using various techniques.