Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] PDF custom properties

From: Eric Jobidon <eric(at)>
Date: Wed Mar 14 2007 - 19:42:44 GMT
"PDFinfo" and "Pdftotext -meta" work fine for the "standard" fields (author,
subject, title, keywords, etc). I have those fields indexed and searchable
for many PDF files already. 

My question is regarding custom PDF properties. Those are field:value pairs
that are stored inside the PDF files. Neither pdfinfo nor pdftotext is able
to extract those.

I understand how swish-e will be able to index that information, I just
don't know how to extract it from the PDF file.

Any pointers?

-----Original Message-----
[] On Behalf Of Peter Karman
Sent: Wednesday, March 14, 2007 12:22 PM
To: Swish-e Users Discussion List
Subject: Re: [swish-e] PDF custom properties

Bill Crawford scribbled on 3/14/07 1:11 PM:
> On Wednesday 14 Mar 2007, Eric Jobidon wrote:
>> I have successfully been using the PDFToText utility to extract text 
>> and "standard" metadata from PDF files. The tool does not, however, 
>> offer the capability to export PDF custom properties.
>> Does anyone know of an open source Linux CLI tool that allows the 
>> extraction of PDF custom properties?
> When you say "standard" do you mean the title, subject etc? There's an 
> option to pdfinfo "-meta" to extract additional metadata, but I don't 
> have any PDFs with such to test it.

yes, that's how SWISH::Filters::Pdf2HTML does it, with pdfinfo.

Peter Karman  .  .  peter(at)
Users mailing list

Users mailing list
Received on Wed Mar 14 14:42:33 2007