I tried adding the FileFilter line to the swish.conf file, and unfortunately
that only made things worse.
The problem that I am seeing is that while the indexing appears to be working
fine, when I do a search for
the files via swish.cgi, the only results that are listed are the index.swish
files. I have not made any
changes to the SWISH::Filter file, which I assume is
/usr/local/lib/swish-e/perl/SWISH/Filter.pm. What
is going on, and what do I need to do to correct this?
-----Original Message-----
From: swish-e@sunsite3.berkeley.edu
[mailto:swish-e@sunsite3.berkeley.edu]On Behalf Of Bill Moseley
Sent: Wednesday, June 09, 2004 8:02 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Error Message: Index file error: Could not open
On Wed, Jun 09, 2004 at 02:15:56PM -0700, Peter Karman wrote:
> I believe a simple FileFilter config line will work, though it is slower
> than the SWISH::Filter module (Bill, correct me on this):
>
> FileFilter .pdf pdftotext "'%p' -"
Only if not using spider.pl's default config. The default config in
spider.pl automatically filters pdf files (if xpdf programs are found in
the path).
By default I mean passing "default <url>" to spider.pl -- the "default"
tells the spider to use a built-in config. Look at spider.pl in an
editor to see that config -- and how it uses SWISH::Filter.
Otherwise, if you don't pass a parameter to spider.pl it will look for
SwishSpiderConfig.pl (IIRC). The example SwishSpiderConfig.pl file also
has examples of how to use SWISH::Filter.
Basically, you default a content filter in spider.pl that passes the
content and the content-type to SWISH::Filter.
That make sense?
--
Bill Moseley
moseley@hank.org
Received on Thu Jun 10 13:50:57 2004