Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] PDF indexing won't work

From: Christoph Lechner <cl0074(at)not-real.l-mx.de>
Date: Sat Sep 05 2009 - 12:22:40 GMT
Christoph Lechner wrote:
> swish-filter-test still dumps the textual contents of the file.

When I run
/usr/local/bin/swish-filter-test --content --verbose --depreciated
ATmega16.pdf
I get exactly the same error message as from swish_filter.pl:
** /usr/local/bin/swish-filter-test:
  Can't locate object method "filter" via package "SWISH::Filter" at
/usr/local/bin/swish-filter-test line 178.

without the --depreciated option the contents is printed to stdout.

So swish_filter uses the old, depreciated interface. I think this should
be considered a bug.

Therefore I modified the swish_filter program to fit the new interface.

>From the docs of spider.pl I see that the spider tries to filter PDF
files, if the filter tools are installed. That's no good, as pdftotext
get some trash as input but not a PDF file.

My swish.conf is:
--> swish.conf <--
IndexDir spider.pl
SwishProgParameters spider.conf

FileFilter      .pdf    ./swish_filter.pl "%p %P"
IndexContents HTML .pdf

StoreDescription HTML* <body>

IndexReport 2

# Allow extra searching by title, path
Metanames swishtitle swishdocpat
--> end <--

My spider.conf is:
--> spider.conf <--
my %kb_site = (
        base_url                => 'http://kb/kb/tb/',
        max_size                => 100000000,
        ignore_robots_file      => 1
);

@servers = ( \%kb_site );
1;
--> end <--

Please find the modified and working swish_filter.pl tool in the
attachments.

- cl

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Sat Sep 5 08:22:44 2009