On 12/03/2007 07:41 PM, Robinson Craig wrote:
> Nevertheless, my question still stands: is there a "standard" way of
> indexing PDF content and metadata?
I don't know about standard. I recommend SWISH::Filter with spider.pl/DirTree.pl and the
-S prog method, over the FileFilter directive, just because once you start using
DirTree.pl/spider.pl as your aggregators, you (1) gain a lot of more flexibility with
respect to filtering, skipping files, etc., and (2) can add more filters transparently by
just dropping new .pm files into the @INC path.
NOTE that SWISH::Filter still uses xpdf tools under the hood, so in the case of PDF
specifically it might be 6/half-dozen. But I prefer to start habits that leave me more
options in the longer term.
NOTE too that Swish3 will likely not have FileFilter, but instead will use SWISH::Filter
from the start.
Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
Users mailing list
Received on Tue Dec 4 12:48:11 2007