Skip to main content.
home | support | download

Back to List Archive

Re: Indexing Attribute Values in a PDF

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Dec 08 2003 - 21:18:36 GMT
On Mon, Dec 08, 2003 at 12:56:10PM -0800, Matt Torbin wrote:
> Hey all,
> 
> I'd like to be able to index the attribute values in a PDF document so 
> that instead of the title of the document coming up as "whatever.pdf" 
> it would come up as "This is my Document" (since that is what I have 
> filled in as my document attributes).  Is this possible?  Has anyone 
> done this?  Can anyone guide me in the right direction?

Is this a different question from the one on December 4th?

  http://swish-e.org/archive/6277.html

Here's the docs for that filter:

NAME
       SWISH::Filters::Pdf2HTML - Perl extension for filtering PDF documents
       with Swish-e

DESCRIPTION
       This is a plug-in module that uses the xpdf package to convert PDF doc-
       uments to html for indexing by Swish-e.  Any info tags found in the PDF
       document are created as meta tags.

       This filter plug-in requires the xpdf package available at:

           http://www.foolabs.com/xpdf/

       You may pass into SWISH::Filter's new method a tag to use as the html
       <title> if found in the PDF info tags:

           my %user_data;
           $user_data{pdf}{title_tag} = 'title';

           $was_filtered = $filter->filter(
               document  => $filename,
               user_data => \%user_data,
           );

       Then if a PDF info tag of "title" is found that will be used as the
       HTML <title>.




-- 
Bill Moseley
moseley@hank.org
Received on Mon Dec 8 21:18:44 2003