LEADTOOLS Support
Document
Document SDK Questions
Image, OCR and hit-highlight information from a PDF
This topic and its replies were posted before the current version of LEADTOOLS was released and may no longer be applicable.
#1
Posted
:
Tuesday, September 12, 2006 7:00:23 AM(UTC)
Groups: Registered
Posts: 3
Is it possible with Leadtools that given a PDF which contains a searchable image to extract the different parts (image, text and hit-highlight information) ?
Thanks in advance.
Best regards,
DEVENTit BV
Peter van Diermen
#2
Posted
:
Wednesday, September 13, 2006 11:18:39 PM(UTC)
Groups: Registered, Tech Support
Posts: 1,326
Was thanked: 1 time(s) in 1 post(s)
Hello,
Do you mean that you need to get information about the different parts of the PDF document (image, text, etc.)?
If yes, please provide me with the following information:
- What is the exact LEADTOOLS version that you use?
- What is the programming interface (COM, API, .Net) that you use?
- Please provide me with more details about what you are trying to do?
Thanks,
Maen Badwan
LEADTOOLS Technical Support
#3
Posted
:
Thursday, September 14, 2006 1:47:42 AM(UTC)
Groups: Registered
Posts: 3
Mean,
We are using Leadtools v14 for .NET. What we want to be able to do is that given a searchable image PDF we want to decompose/extract the information to:
- the image that is contained in the PDF
- the text information which is available for searching in the PDF
- and the positioning information of every word which is used in the PDF to highlight the terms searched for (on the image).
Thanks in advance.
#4
Posted
:
Sunday, September 17, 2006 8:25:59 PM(UTC)
Groups: Registered, Tech Support
Posts: 1,326
Was thanked: 1 time(s) in 1 post(s)
Hello,
We don't have functions that load the contents of the PDF file as text, drawing and image objects. What we have is the Raster PDF Plug-in, which enables you to load the full page as a raster image. Any images/text/drawing objects will be merged into one image when loading a page.
If you have our OCR module (part of LEADTOOLS Document Imaging Suite), you can take the resulting image and OCR it to obtain text information. However, this does not guarantee you will get the exact same text from the PDF, because there are different factors and special cases. For example, some PDF files might contain 'hidden' text that although you can use it to search text in Acrobat, this text does not appear in the 'rasterized' image loaded by LEADTOOLS.
Thanks,
Maen Badwan
LEADTOOLS Technical Support
#5
Posted
:
Monday, September 18, 2006 2:41:36 AM(UTC)
Groups: Registered
Posts: 3
Mean, thanks for this information
LEADTOOLS Support
Document
Document SDK Questions
Image, OCR and hit-highlight information from a PDF
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.