Skip to main content.
home | support | download

Back to List Archive

RE: Using the SwishSpiderConfig.pl file

From: Peter Karman <karman(at)not-real.cray.com>
Date: Thu Jun 17 2004 - 13:30:16 GMT
Kaplan, Andrew H. wrote on 06/17/2004 07:43 AM:

> Here is the text of the swish.conf file without spider.pl:
> 
> IndexDir /www
> StoreDescription HTML* <body> 200000
> MetaNames swishdocpath swishtitle
> ReplaceRules replace "/www/" "http://192.168.1.156/"
> 

>>The command syntax that is used here is /usr/local/bin/swish-e -c swish.conf -v
>>3
>>
>>This approach does appear to index the pdf and doc files, but error messages
>>appear saying the program is substituting
>>embedded null characters in the pdf and doc files that I am indexing. I did a
>>check of the discussion lists and the issue 
>>has to do with the fact the files being indexed are binary. I tried adding
>>several lines to the swish.conf file including
>>IndexOnly, IndexContents and NoContents. That did not make a difference. Does
>>anyone have suggestions on where to
>>go from here?


So: spider.pl does NOT work.
no spider.pl DOES work.

Test the index created without the spider. Does searching your PDFs work?

If you turn off the -v 3 option, you don't get warnings.

If search works, and you get no warnings, then you don't have a problem. 
Right? :)



-- 
Peter Karman - Software Publications Programmer - Cray Inc
phone: 651-605-9009 - mailto:karman@cray.com
Received on Thu Jun 17 13:30:18 2004