Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] PDF indexing won't work

From: Christoph Lechner <cl0074(at)not-real.l-mx.de>
Date: Sat Sep 05 2009 - 10:24:16 GMT
Peter Karman wrote:
> Christoph Lechner wrote on 9/4/09 5:55 PM:
>> Hi all!
>>
>> [swish-e version 2.4.5 (Debian 5.0 stable)]
>>
>> I'm new to swish-e and trying to index PDF files.
>> swish_filter.pl drops some error messages that I don't understand. The
>> error messages are the same for any PDF file it tries to index.
>>
>> spider@web-int:~/swish-e$ swish-e -S prog -c swish.conf
>> Indexing Data Source: "External-Program"
>> Indexing "spider.pl"
>> External Program found: /usr/lib/swish-e/spider.pl
>> /usr/lib/swish-e/spider.pl: Reading parameters from 'default'
>> Processing http://kb/kb/tb/...
>> Processing http://kb/kb/tb/ATmega16.pdf...
>> Failed to set content type for document './swtmpfltraEpRfM'
>> Can't return outside a subroutine at
>> /usr/share/doc/swish-e/examples/filter-bin/swish_filter.pl line 55.
>>
>> Warning: filter
>> '/usr/share/doc/swish-e/examples/filter-bin/swish_filter.pl' exited with
>> non-zero status: [255]
>>
>> If I wget one of the PDF files and run the test program, the contents of
>> the PDF is dumped to the console without error messages showing up:
>>
>> /usr/share/doc/swish-e/examples/swish-filter-test --content --verbose
>> d002X02.pdf
>>
>> What's wrong?
> 
> send along your swish.conf and spider.pl config files.
My swish.conf (My crazy user agent breaks the FileFilter line, sorry)
--> swish.conf <--
IndexDir spider.pl

SwishProgParameters default http://kb/kb/tb/

FileFilter  .pdf
/usr/share/doc/swish-e/examples/filter-bin/swish_filter.pl %p
IndexContents HTML .pdf

StoreDescription HTML* <body>

IndexReport 2

# Allow extra searching by title, path
Metanames swishtitle swishdocpath
--> end swish.conf <--

I'm not sure if I created a spider.pl config file. At least I can't find
it. May be that's the problem.

> 
> I'm suspicious that spider.pl is calling swish_filter.pl at all. That seems
> wrong. spider.pl should be using SWISH::Filter internally, not delegating to
> swish_filter.pl.
> 
> also, you might try getting the latest version (2.4.7). 2.4.5 is now a few years
> old.

OK, I'll try to upgrade. But even in the Debian timescale, a few years
is a long time ... So I wonder why they ship with such an old release.

- cl

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Sat Sep 5 06:24:20 2009