Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] XML parsing not returning Title

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Tue Dec 04 2007 - 17:48:10 GMT
On 12/03/2007 07:41 PM, Robinson Craig wrote:

> Nevertheless, my question still stands: is there a "standard" way of
> indexing PDF content and metadata?
> 

I don't know about standard. I recommend SWISH::Filter with spider.pl/DirTree.pl and the
-S prog method, over the FileFilter directive, just because once you start using
DirTree.pl/spider.pl as your aggregators, you (1) gain a lot of more flexibility with
respect to filtering, skipping files, etc., and (2) can add more filters transparently by
just dropping new .pm files into the @INC path.

See http://swish-e.org/docs/swish-config.html#filtering_with_swish_filter

NOTE that SWISH::Filter still uses xpdf tools under the hood, so in the case of PDF
specifically it might be 6/half-dozen. But I prefer to start habits that leave me more
options in the longer term.

NOTE too that Swish3 will likely not have FileFilter, but instead will use SWISH::Filter
from the start.

-- 
Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Dec 4 12:48:11 2007