This tutorial shows three different techniques to individually save each page of a multipage PDF in a C# Windows Console application using the LEADTOOLS SDK.
Overview | |
---|---|
Summary | This tutorial covers how to split multipage PDF files in a C# Windows Console application. |
Completion Time | 30 minutes |
Visual Studio Project | Download tutorial project (4 KB) |
Platform | C# Windows Console Application |
IDE | Visual Studio 2019, 2022 |
Development License | Download LEADTOOLS |
Try it in another language |
|
Get familiar with the basic steps of creating a project by reviewing the Add References and Set a License tutorial, before working on the Split a PDF File into Multiple Files - Console C# tutorial.
Start with a copy of the project created in the Add References and Set a License tutorial. If that project is unavailable, follow the steps in that tutorial to create it.
The references needed depend upon the purpose of the project. References can be added by one or the other of the following two methods (but not both).
If using NuGet references, this tutorial requires the following NuGet package:
Leadtools.Document.Sdk
If using local DLL references, the following DLLs are needed.
The DLLs are located at <INSTALL_DIR>\LEADTOOLS22\Bin\Dotnet4\x64
:
Leadtools.dll
Leadtools.Codecs.dll
Leadtools.Codecs.Cmp.dll
Leadtools.Codecs.Tif.dll
Leadtools.Document.dll
Leadtools.Document.Converter.dll
Leadtools.Document.Pdf.dll
Leadtools.Document.Writer.dll
Leadtools.Pdf.dll
Leadtools.Ocr.dll
Leadtools.Ocr.LEADEngine.dll
For a complete list of which DLL files are required for your application, refer to Files to be Included in your Application.
The License unlocks the features needed for the project. It must be set before any toolkit function is called. For details, including tutorials for different platforms, refer to Setting a Runtime License.
There are two types of runtime licenses:
With the project created, the references added, and the license set, coding can begin.
In the Solution Explorer, open Program.cs
. Add the following statements to the using block at the top of Program.cs
:
// Using block at the top
using System;
using System.IO;
using Leadtools;
using Leadtools.Codecs;
using Leadtools.Pdf;
using Leadtools.Document;
using Leadtools.Document.Converter;
using Leadtools.Document.Writer;
using Leadtools.Ocr;
Add the code below to the Main()
method to create the split files directory and call the methods created in the sections below.
static void Main(string[] args)
{
try
{
SetLicense();
string multipageFile = @"C:\LEADTOOLS22\Resources\Images\leadtools.pdf";
string _splitDir = @"C:\LEADTOOLS22\Resources\Images\Split PDFs";
if (!Directory.Exists(_splitDir))
{
Directory.CreateDirectory(_splitDir);
}
SplitUsingRasterCodecs(multipageFile, _splitDir);
SplitUsingPDFFile(multipageFile, _splitDir);
SplitUsingLEADDocument(multipageFile, _splitDir);
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
Console.WriteLine("Press any key to exit.");
Console.ReadKey(true);
}
Three different techniques for splitting the pages of a PDF file will be discussed below, each has its own advantages.
In this approach, each page is loaded as a raster (bitmap) image, then saved as a raster PDF file. This is done using the RasterCodecs class.
The main advantage of this approach is code simplicity. It only takes a few lines of code, and the exact same code can be used for other multipage formats such as TIFF or GIF.
Create a new method in the Program
class named SplitUsingRasterCodecs(string inputFile, string _directory)
. This method will be called inside the Main()
method, as shown above.
static void SplitUsingRasterCodecs(string inputFile, string _directory)
{
using (RasterCodecs codecs = new RasterCodecs())
{
codecs.Options.Pdf.InitialPath = @"C:\LEADTOOLS22\Bin\Dotnet4\x64";
int totalPages = codecs.GetTotalPages(inputFile);
Console.Write($"SplitUsingRasterCodecs..\nTotal pages: {totalPages}, Splitting pages: ");
for (int page = 1; page <= totalPages; page++)
{
Console.Write($"{page}.. ");
string outputFileName = $"{Path.GetFileNameWithoutExtension(inputFile)}_codecs_page{page}.pdf";
string outputFile = Path.Combine(_directory, outputFileName);
using (RasterImage image = codecs.Load(inputFile, page))
codecs.Save(image, outputFile, RasterImageFormat.RasPdfLzw, 0);
}
Console.WriteLine();
}
}
In this approach, the PDFFile class is used, which is a dedicated class for the PDF format. This means the code cannot be used with other document or image formats.
The main advantage of this approach is that it preserves the contents of PDF pages since it does not convert searchable text to raster images. Additionally, in many cases it does not cause re-encoding of images that exist in the original PDF file, which improves performance and maintains image quality. The code is also very simple.
Create a new method in the Program
class named SplitUsingPDFFile(string inputFile, string _directory)
. This method will be called inside the Main()
method, as shown above.
static void SplitUsingPDFFile(string inputFile, string _directory)
{
PDFFile pdfFile = new PDFFile(inputFile);
int totalPages = pdfFile.GetPageCount();
Console.Write($"SplitUsingPDFFile..\nTotal pages: {totalPages}, Splitting pages: ");
for (int page = 1; page <= totalPages; page++)
{
Console.Write($"{page}.. ");
string outputFileName = $"{Path.GetFileNameWithoutExtension(inputFile)}_pdfFile_page{page}.pdf";
string outputFile = Path.Combine(_directory, outputFileName);
pdfFile.ExtractPages(page, page, outputFile);
}
Console.WriteLine();
}
This approach is the most advanced of the three and it utilizes the LEADDocument and DocumentConverter classes.
Since these classes are versatile for use with different formats, similar code can be used for splitting many types of document files and outputting to different document and raster formats. For example, in the code below, simply changing DocumentFormat.Pdf
to become DocumentFormat.Docx
will split the file into Microsoft Word output pages instead of PDF pages. Additionally, these powerful classes produce optimized output files.
Create a new method in the Program
class named SplitUsingPDFFile(string inputFile, string _directory)
. This method will be called inside the Main()
method, as shown above.
static void SplitUsingLEADDocument(string inputFile, string _directory)
{
DocumentWriter documentWriter = new DocumentWriter();
// Optional: use documentWriter.GetOptions() and documentWriter.SetOptions() to modify PDF options
var createOptions = new CreateDocumentOptions();
LEADDocument inputDocument = DocumentFactory.LoadFromFile(inputFile, new LoadDocumentOptions());
IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS22\Bin\Common\OcrLEADRuntime");
Console.Write($"SplitUsingLEADDocument..\nTotal pages: {inputDocument.Pages.Count}, Splitting pages: ");
foreach (var inputPage in inputDocument.Pages)
{
LEADDocument pageDocument = DocumentFactory.Create(createOptions);
pageDocument.AutoDisposeDocuments = true;
pageDocument.Name = "VirtualPage";
pageDocument.Pages.Add(inputPage);
DocumentConverter docConverter = new DocumentConverter();
docConverter.SetOcrEngineInstance(ocrEngine, false);
docConverter.SetDocumentWriterInstance(documentWriter);
int page = inputDocument.Pages.IndexOf(inputPage) + 1; // (+ 1) since index is zero-based
Console.Write($"{page}.. ");
var jobData = new DocumentConverterJobData
{
Document = pageDocument,
OutputDocumentFileName = Path.Combine(_directory, $"{Path.GetFileNameWithoutExtension(inputFile)}_LeadDoc_page{page}.pdf"),
DocumentFormat = DocumentFormat.Pdf
};
var job = docConverter.Jobs.CreateJob(jobData);
docConverter.Jobs.RunJob(job);
}
Console.WriteLine("");
ocrEngine.Shutdown();
}
To handle the files using MemoryStream
, modify the two methods SplitUsingRasterCodecs
and SplitUsingLEADDocumentreplace
, and modify the code that calls them from the Main()
method as follows:
// The following code goes inside the Main method
// Note that the PDFFile class does not accept stream input
byte[] multipageData = File.ReadAllBytes(multipageFile);
using (MemoryStream multipageStream = new MemoryStream(multipageData))
{
SplitUsingRasterCodecs(multipageStream);
SplitUsingLEADDocument(multipageStream);
}
static void SplitUsingRasterCodecs(Stream inputStream)
{
using (RasterCodecs codecs = new RasterCodecs())
{
codecs.Options.Pdf.InitialPath = @"C:\LEADTOOLS22\Bin\Dotnet4\x64";
int totalPages = codecs.GetTotalPages(inputStream);
Console.Write($"SplitUsingRasterCodecs..\nTotal pages: {totalPages}, Splitting pages: ");
for (int page = 1; page <= totalPages; page++)
{
Console.Write($"{page}.. ");
using (RasterImage image = codecs.Load(inputStream, page))
using (MemoryStream outputStream = new MemoryStream())
{
codecs.Save(image, outputStream, RasterImageFormat.RasPdfLzw, 0);
// Use output Memory Stream containing the split file before it is closed and freed for the next page
}
}
Console.WriteLine();
}
}
static void SplitUsingLEADDocument(Stream inputStream)
{
DocumentWriter documentWriter = new DocumentWriter();
// Optional: use documentWriter.GetOptions() and documentWriter.SetOptions() to modify PDF options
var createOptions = new CreateDocumentOptions();
LEADDocument inputDocument = DocumentFactory.LoadFromStream(inputStream, new LoadDocumentOptions());
IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS22\Bin\Common\OcrLEADRuntime");
Console.Write($"SplitUsingLEADDocument..\nTotal pages: {inputDocument.Pages.Count}, Splitting pages: ");
foreach (var inputPage in inputDocument.Pages)
{
LEADDocument pageDocument = DocumentFactory.Create(createOptions);
pageDocument.AutoDisposeDocuments = true;
pageDocument.Name = "VirtualPage";
pageDocument.Pages.Add(inputPage);
DocumentConverter docConverter = new DocumentConverter();
docConverter.SetOcrEngineInstance(ocrEngine, false);
docConverter.SetDocumentWriterInstance(documentWriter);
int page = inputDocument.Pages.IndexOf(inputPage) + 1; // (+ 1) since index is zero-based
Console.Write($"{page}.. ");
var jobData = new DocumentConverterJobData
{
Document = pageDocument,
OutputDocumentStream = new MemoryStream(),
DocumentFormat = DocumentFormat.Pdf,
JobName = "LeadDoc_page" + page
};
var job = docConverter.Jobs.CreateJob(jobData);
docConverter.Jobs.JobCompleted += Jobs_JobCompleted;
docConverter.Jobs.RunJob(job);
}
Console.WriteLine("");
ocrEngine.Shutdown();
}
private static void Jobs_JobCompleted(object sender, DocumentConverterJobEventArgs e)
{
MemoryStream outputStream = e.Job.JobData.OutputDocumentStream as MemoryStream;
// Each output stream will contain a split page after conversion job is complete
// Use stream here before freeing and closing
outputStream.Dispose();
outputStream.Close();
}
Run the project by pressing F5, or by selecting Debug -> Start Debugging.
If the steps were followed correctly, the application runs and creates new files. Each page of leadtools.pdf
should be created as a separate PDF file in three different ways, with the page number appended to the name.
This tutorial showed how to add the necessary references to load all the pages of a PDF file and split them into separate documents using various techniques.