Image, OCR and hit-highlight information from a PDF

Options

View

Last

Unread

Previous Topic Next Topic

This topic and its replies were posted before the current version of LEADTOOLS was released and may no longer be applicable.

#1 Posted : Tuesday, September 12, 2006 7:00:23 AM(UTC)

pdiermen

Groups: Registered
Posts: 3

Is it possible with Leadtools that given a PDF which contains a searchable image to extract the different parts (image, text and hit-highlight information) ?

Thanks in advance.

Best regards,
DEVENTit BV
Peter van Diermen


	Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads Wanna join the discussion? Login to your LEADTOOLS Support account or Register a new forum account.

#2 Posted : Wednesday, September 13, 2006 11:18:39 PM(UTC)

Maen Hasan

Groups: Registered, Tech Support
Posts: 1,326

Was thanked: 1 time(s) in 1 post(s)

Hello,

Do you mean that you need to get information about the different parts of the PDF document (image, text, etc.)?
If yes, please provide me with the following information:
- What is the exact LEADTOOLS version that you use?
- What is the programming interface (COM, API, .Net) that you use?
- Please provide me with more details about what you are trying to do?

Thanks,
Maen Badwan
LEADTOOLS Technical Support

#3 Posted : Thursday, September 14, 2006 1:47:42 AM(UTC)

pdiermen

Groups: Registered
Posts: 3

Mean,

We are using Leadtools v14 for .NET. What we want to be able to do is that given a searchable image PDF we want to decompose/extract the information to:

the image that is contained in the PDF
the text information which is available for searching in the PDF
and the positioning information of every word which is used in the PDF to highlight the terms searched for (on the image).

Thanks in advance.

#4 Posted : Sunday, September 17, 2006 8:25:59 PM(UTC)

Maen Hasan

Groups: Registered, Tech Support
Posts: 1,326

Was thanked: 1 time(s) in 1 post(s)

Hello,

We don't have functions that load the contents of the PDF file as text, drawing and image objects. What we have is the Raster PDF Plug-in, which enables you to load the full page as a raster image. Any images/text/drawing objects will be merged into one image when loading a page.

If you have our OCR module (part of LEADTOOLS Document Imaging Suite), you can take the resulting image and OCR it to obtain text information. However, this does not guarantee you will get the exact same text from the PDF, because there are different factors and special cases. For example, some PDF files might contain 'hidden' text that although you can use it to search text in Acrobat, this text does not appear in the 'rasterized' image loaded by LEADTOOLS.

Thanks,
Maen Badwan
LEADTOOLS Technical Support

#5 Posted : Monday, September 18, 2006 2:41:36 AM(UTC)

pdiermen

Groups: Registered
Posts: 3

Mean, thanks for this information

You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.