Re: [swish-e] PDF indexing won't work

From: Christoph Lechner <cl0074(at)>
Date: Sat Sep 05 2009 - 12:22:40 GMT
Christoph Lechner wrote:
> swish-filter-test still dumps the textual contents of the file.

When I run
/usr/local/bin/swish-filter-test --content --verbose --depreciated
I get exactly the same error message as from
** /usr/local/bin/swish-filter-test:
  Can't locate object method "filter" via package "SWISH::Filter" at
/usr/local/bin/swish-filter-test line 178.

without the --depreciated option the contents is printed to stdout.

So swish_filter uses the old, depreciated interface. I think this should
be considered a bug.

Therefore I modified the swish_filter program to fit the new interface.

>From the docs of I see that the spider tries to filter PDF
files, if the filter tools are installed. That's no good, as pdftotext
get some trash as input but not a PDF file.

My swish.conf is:
--> swish.conf <--
SwishProgParameters spider.conf

FileFilter      .pdf    ./ "%p %P"
IndexContents HTML .pdf

StoreDescription HTML* <body>

IndexReport 2

# Allow extra searching by title, path
Metanames swishtitle swishdocpat
--> end <--

My spider.conf is:
--> spider.conf <--
my %kb_site = (
        base_url                => 'http://kb/kb/tb/',
        max_size                => 100000000,
        ignore_robots_file      => 1

@servers = ( \%kb_site );
--> end <--

Please find the modified and working tool in the

- cl

