Hello,
I was trying to index PDFs with Swish and encountered a bucket load of
problems. It took a lot of tinkering to get it right. In the process, I made
a few changes (mostly related to paths and directory names with spaces). If
you can't get file-system based PDF indexing right, try the following
changes. Platform: Win2K
------------------------------------
Set correct filter path in configuration file. e.g.
FileFilter .pdf ./lib/swish-e/_pdf2html.pl '"%p"'
Set correct path to pdf converters in _pdf2html.pl. e.g.
$ENV{PATH} ='D:/Program Files/SWISH-E4/lib/swish-e;'. $ENV{PATH};
Fix IndexDir paths - for directories with embedded spaces use /\ instead of
\ in configuration file. e.g.
IndexDir "D:/\Documents and Settings/\cmp026/\My Documents" is correct
IndexDir D:\Documents and Settings\cmp026\My Documents is wrong
Pass correct path to pdf converters. Change $file to \"$file\" in
_pdf2html.pl. e.g.
open F, "pdfinfo \"$file\" |" ||
open F, "pdftotext \"$file\" - |" or die "$0: failed to run pdftotext:
$!";
Set correct stopwords directory in your configuration file (fix
example4.config if using *any* supplied .config file)
IgnoreWords file: conf/stopwords/english.txt
Set correct site-wide include reference in your configuration file (fix if
using *any* supplied .config file).
e.g. in your .config file:
IncludeConfigFile conf/example4.config
------------------------------
Zeeshan Ahmad,
FMC Computing Services
Received on Fri Nov 7 04:34:57 2003