Merging PDFs loses pages

Options

View

Last

Unread

Previous Topic Next Topic

This topic and its replies were posted before the current version of LEADTOOLS was released and may no longer be applicable.

#1 Posted : Monday, November 17, 2008 7:39:17 AM(UTC)

keithway

Groups: Registered
Posts: 29

Hello, I am using Leadtools for .NET (Document Imaging, PDF Read/Save) version 15.0.1.0

I am having a problem loading multiple PDF files and saving (merging) them into a single .tif file.

I have built a demo application to let you guys see what is going on and it is attached to this post.

Here is how to reproduce...

1) Unzip the attached .zip file. You will find the following files in the first directory...

40pages.pdf - this is the test file I have been using to reproduce this issue. Note: It is not a big file.. only 90K.

Bin Directory.JPG - This is just a picture of what Leadtools files I had in my Bin directory. Note: The exact version numbers are present in this photo. You will need to have the Leadtools.dll and Leadtools.Codecs.dll referenced in your project. The codecs and PDF folder will just need to be placed in your bin.

LT PDF Merge Demo.sln - VB.Net solution file for the demo application

LT PDF Merge Demo (Folder) - Project files for the demo application.

2) Re-add the LT unlock codes. You will find the sub that unlocks the licenses in Common.vb under the namespace 'Initialization' and the Sub 'Initialize_Licenses'

3) Re-add the missing LT references in your project. Take a look at Bin Directory.jpg as described above for exact version numbers.

4) Re-add your codec files as well as the PDF directory.

5) Build and run the demo application.

6) Use the 'source files' list box to select the 40pages.pdf file that is included in this demo. NOTE: Depending on how much RAM you have on your machine you may need to take additional steps as outlined below in 'Additional Info'

7) You may leave the destination file name alone or select a new name.

8) Click on the merge button. All controls will be disabled until the merge is complete. Once all controls are enabled again, you can then go check your destination file.

9) Check the number of pages in your destination file. You can do this either by 'right-clicking' on the image file, choosing properties, selecting the summary tab and then clicking the 'advanced' button, or you can open the image up in a viewer that supports multiple pages.

If you used the 40pages.pdf file included in this demo and the resulting file still has 40 pages, please read 'additional info' below to see how to reproduce this, if you have less than 40 pages, you have already reproduced the issue.

ADDITIONAL INFO:

While trouble shooting this problem, I have uncovered a few things...

1) The number of source files needed to reproduce this issue seems to vary depending on the amount of RAM you have on your machine. For example, I have run this on a machine that only has a half a Gig of RAM and I was able to reproduce this using only the 40 page pdf provided. However, on a machine with a full Gig of RAM I had to make a copy of the 40 page pdf and then input '40pages.pdf' and 'Copy of 40pages.pdf' as my source files to reproduce. If you are having trouble reproducing this, I suggest making 3 or 4 additional copies and inputing them all as the source files.

2) The problem here seems to be with the
RasterCodecs object.

Note: This example assumes I am on a machine with 1 gig of RAM and I have two input files... 40pages.pdf and Copy of 40pages.pdf

On line 57 of Form1.vb in the MergeFiles function I load each individual source image like this....

' Load the image
RasImage = MergeCodecs.Load(LocalPath)

Well for the FIRST image, if I check the pagecount of the RasImage object the codecs returned, the pagecount is correct and is 40. However, when the loop returns to load the SECOND image, the merge codecs does load the file (or part of it) and returns a RasImage object, but the pagecount is off. In my testing it would repeatedly return a pagecount of 12 even though I knew the source image was 40 pages.

To me it seems the RasterCodecs object may be running out of memory.

3) If you do not close the demo application before running another test, it seems that the RasterCodecs object will eventually run out of memory no matter how many source files you started with.

Note: This may be due to me not calling the RasterCodecs.Shutdown method, however, I have been able to reproduce losing pages on the first run of the demo repeatedly.

4) This only seems to be happening when using PDFs as my source files, I have been able to run successful tests using this same code and having .tif files as my source files with no problems.

File Attachment(s):

LT PDF Merge Demo.zip (181kb) downloaded 23 time(s).


	Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads Wanna join the discussion? Login to your LEADTOOLS Support account or Register a new forum account.

#2 Posted : Monday, November 17, 2008 12:42:24 PM(UTC)

jigar

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

When dealing with large (in terms of DPI and page count) PDFs you should load a certain number of pages at a time. PDFs are rasterized when they are loaded. the pages in the PDF are 24"x32.32" and you are loading them at 100dpi 24bits/pixel, then each page will need ~23.3mb of memory. With 40 pages that is ~930.8mb that you need. So the solution is to load maybe 5 pages at a time.

#3 Posted : Tuesday, November 18, 2008 9:38:53 AM(UTC)

keithway

Groups: Registered
Posts: 29

Im sorry but that solution just will not work.

It does not solve the fact that the rastercodecs object is not freeing the memory that it is using. To prove this I am attaching a 5 page pdf test file to this post. If you make 10-15 copies of the file and use all of them as input files in the demo that I originally posted you will see that pages are still being lost. With 10 copies of the file, the resulting image should be 50 pages. While testing here on a maching that has 1 gig of RAM I am only getting 37 pages.

Furthermore, if you use multiple 40 page pdf test files in the demo app - the rastercodecs object is able to load the first one correctly - it isnt until the second (or third in some cases) file attempts to load before there are problems. This should further prove that the RasterCodecs object is not freeing up memory correctly. If it can load and save one 40 page file, it should be able to load and save a second copy of the same file.

File Attachment(s):

5 pages.zip (48kb) downloaded 22 time(s).

#4 Posted : Wednesday, November 19, 2008 12:18:55 PM(UTC)

jigar

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

Our RasterCodecs object is releasing the handle to the memory it was using, but the garbage collector is not freeing it. At the end of the foreach loop, call GC.Collect(). This will force the garbage collector to free the memory. I tested it out and it seems to be working fine.

#5 Posted : Monday, January 5, 2009 11:58:43 AM(UTC)

keithway

Groups: Registered
Posts: 29

Forcing the garbage collector to run seems to slow the problem down but not fix it... I am still able to lose pages even with this solution.

Please test again on your end using more source pages and you should see similar results.

#6 Posted : Wednesday, January 7, 2009 8:38:35 AM(UTC)

jigar

Groups: Guests
Posts: 3,022

Was thanked: 2 time(s) in 2 post(s)

Ok, I re-tested with the same 40p PDF and I lost some pages when I merged 6 of them. I only got 198 pages instead of the expected 240. Now, if I call RasImage.Dispose() after I'm done with the RasImage object then it doesn't loose any pages. I tested this on a 1GB machine. The RasterImage object holds the image data in unmanaged memory so you should call Dispose() on it to free the memory it uses. Try it out on your end and let me know how it goes.

#7 Posted : Monday, January 12, 2009 12:21:55 PM(UTC)

keithway

Groups: Registered
Posts: 29

That seems to work. Thank you for your help. I will let you know if I run into anymore problems or discover anything new with this issue.

You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.