Skip to main content.
home | support | download

Back to List Archive

index pdf files with spider.pl

From: Jean Mao <maoj(at)not-real.mail.nih.gov>
Date: Wed May 07 2003 - 18:31:14 GMT
Hello, I was trying to index pdf files on our webserver but failed. here =
is what I used:

swish-e -c biowulf.conf -S prog -v 0 -f biowulf.index

the biowulf.conf I used looks like this:

IndexDir ./prog-bin/spider.pl
# Tell the spider what to index.
ReplaceRules remove "http://"
SwishProgParameters default http://biowulf.nih.gov
IndexContents HTML .html .htm .pdf
FileFilter .pdf /info/WWW/search-bin/filter-bin/_pdf2html.pl "'%p' -"
DefaultContents HTML
StoreDescription HTML <body> 200000
MetaNames swishdocpath swishtitle

Thank you very much!

Jean



*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Wed May 7 18:35:22 2003