Skip to main content.
home | support | download

Back to List Archive

indexing pdf on win 2000

From: Kelli Coggins <kelli(at)>
Date: Mon Jun 09 2003 - 14:42:19 GMT
With the help of this discussion group, Swish-e is working beautifully on
our intranet. I am now attempting to get the pdf files indexed, and after
reading the documentation and previous pdf discussions, I am thoroughly
confused about how to go about it.

I am using a win 2000 machine and I think I need to use xpdf's pdftotext
filter. We have not installed Perl on our machine.

My swish-e config file is:
# Configuration file for LowCoNet Intranet Procedures

# This is the name of the index file
IndexFile c:/Swish-e/procedures.idx

#Index the files in this folder
IndexDir "d:/LowCoNet Intranet Files/Procedures"

#Remove this part of the path. It will be replaced with
#the URL by the php interface config file
ReplaceRules remove "d:/LowCoNet Intranet Files/"

#Only index files ending in .htm .html .pdf
IndexOnly .htm .html .pdf .txt
IndexContents TXT2 .pdf .txt .doc

MetaNames swishdocpath swishtitle
PropertyNames description author keywords

#Don't Index files with ~
FileRules pathname contains ~

#Assign the pdftotext filter to .pdf files
FileFilter .pdf c:/xpdf/pdftotext.exe '"%p"-'

I have installed xpdf and edited the sample.xpdfrc

I run the following from my command line
swish-e -c procedures.cfg -s prog

I get a good index of the html files, but for each pdf file swish finds, I
get an "error:couldn't open file myfilename.pdf. As I am not in any way a
programmer, I am getting more lost the more I try to trouble shoot this
issue. Can anyone set me out on a better path?

Received on Mon Jun 9 14:42:32 2003