Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] PDF custom properties

From: Eric Jobidon <eric(at)not-real.NeoPaper.net>
Date: Wed Mar 14 2007 - 19:42:44 GMT
"PDFinfo" and "Pdftotext -meta" work fine for the "standard" fields (author,
subject, title, keywords, etc). I have those fields indexed and searchable
for many PDF files already. 

My question is regarding custom PDF properties. Those are field:value pairs
that are stored inside the PDF files. Neither pdfinfo nor pdftotext is able
to extract those.

I understand how swish-e will be able to index that information, I just
don't know how to extract it from the PDF file.

Any pointers?

-----Original Message-----
From: users-bounces@lists.swish-e.org
[mailto:users-bounces@lists.swish-e.org] On Behalf Of Peter Karman
Sent: Wednesday, March 14, 2007 12:22 PM
To: Swish-e Users Discussion List
Subject: Re: [swish-e] PDF custom properties



Bill Crawford scribbled on 3/14/07 1:11 PM:
> On Wednesday 14 Mar 2007, Eric Jobidon wrote:
> 
>> I have successfully been using the PDFToText utility to extract text 
>> and "standard" metadata from PDF files. The tool does not, however, 
>> offer the capability to export PDF custom properties.
>>
>> Does anyone know of an open source Linux CLI tool that allows the 
>> extraction of PDF custom properties?
> 
> When you say "standard" do you mean the title, subject etc? There's an 
> option to pdfinfo "-meta" to extract additional metadata, but I don't 
> have any PDFs with such to test it.
> 

yes, that's how SWISH::Filters::Pdf2HTML does it, with pdfinfo.


--
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Mar 14 14:42:33 2007