Search PDF output

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Search PDF output

Wilfred Springer-5
Hi all,

I've been working with FOP for several years now, but the thing that
I've never been able to solve is to make sure that people can search
through the contents of the generated output. If I do search through my
document in Acrobat Reader, then it will never return any hits, no
matter what kind of query I use.

I'm basically working like this:

DocBook ----(docbook-xsl)----> FO
FO ------(FOP)------> PDF

Has anybody ever experienced this? Is there like a hidden option that I
need to to turn on? (Is it relevant that I'm using a non-standard
font???)

Thanks,

Wilfred

--
_________________________________________________________________
Wilfred Springer                Phone  : +31 (0)3 3451 5736
Software Architect              Mobile : +31 (0)6 2295 7321
Client Solutions                Fax    : +31 (0)3 3451 5734
Enterprise Web Services         Mail   : [hidden email]
Sun Microsystems Netherlands    AIM    : wilfred springer


NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact
the sender by reply email and destroy all copies of the original
message.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Search PDF output

Chris Bowditch
Wilfred Springer wrote:

> Hi all,
>
> I've been working with FOP for several years now, but the thing that
> I've never been able to solve is to make sure that people can search
> through the contents of the generated output. If I do search through my
> document in Acrobat Reader, then it will never return any hits, no
> matter what kind of query I use.

This problem is caused by fact that you are using custom fonts. It is possible
to work around this issue in some limited scenarios. If the custom font you
are using is True Type and you are using a small set of characters from the
custom font then when generating the font metrics file you can specify the
-enc ansi option. This should allow you to search the generated PDF.

<snip/>

Chris


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Search PDF output

Luke-2
In reply to this post by Wilfred Springer-5
I created an application to search PDF documents (including ones I had
generated with FOP) using PDFBox and Lucene.

Lucene is a  high-performance, full-featured text search engine library
written entirely in Java (http://lucene.apache.org/java/docs/),  PDFBox is
an open source Java PDF library for working with PDF documents
(http://www.pdfbox.org/), and is used to extract content from a PDF and get
it into a form that Lucene can search.

The book Lucene in Action by Erick Hatcher (http://www.lucenebook.com/)
provides all the sample code you would need to index and search a PDF.

Hope that helps.

Luke

----- Original Message -----
From: "Wilfred Springer" <[hidden email]>
To: <[hidden email]>
Sent: Thursday, June 02, 2005 4:02 AM
Subject: Search PDF output


> Hi all,
>
> I've been working with FOP for several years now, but the thing that
> I've never been able to solve is to make sure that people can search
> through the contents of the generated output. If I do search through my
> document in Acrobat Reader, then it will never return any hits, no
> matter what kind of query I use.
>
> I'm basically working like this:
>
> DocBook ----(docbook-xsl)----> FO
> FO ------(FOP)------> PDF
>
> Has anybody ever experienced this? Is there like a hidden option that I
> need to to turn on? (Is it relevant that I'm using a non-standard
> font???)
>
> Thanks,
>
> Wilfred
>
> --
> _________________________________________________________________
> Wilfred Springer                Phone  : +31 (0)3 3451 5736
> Software Architect              Mobile : +31 (0)6 2295 7321
> Client Solutions                Fax    : +31 (0)3 3451 5734
> Enterprise Web Services         Mail   : [hidden email]
> Sun Microsystems Netherlands    AIM    : wilfred springer
>
>
> NOTICE: This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution
> is prohibited. If you are not the intended recipient, please contact
> the sender by reply email and destroy all copies of the original
> message.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]