I have to use IIS (arggg), so I installed SWISH-E from the Windows installer, and I followed all the defaults to install in C:\Program Files\SWISH-E\ . I then had a lot of trouble configuring swish.cgi, until I got these two variables right:
swish_binary => 'C:\"Program Files"\SWISH-E\swish-e.exe', # Location of swish-e binary
swish_index => 'C:\Program Files\SWISH-E\index.swish-e', # Location of your index file
The space in "Program Files" needs to be quoted in the first, but not in the second, because of the way DOS and Perl interact.
My debugging efforts were complicated by the fact that I was using Cygwin, which is slightly different, so I could run the CGI fine from the cygwin/bash command line, but it wouldn't work on IIS (which uses DOS to run swish-e.exe). One thing that helped a lot was using CGI::Carp's carpout function to write my own error log (remember to make it writeable by the webserver, which runs as MACHINENAME_IUSR, I think), since I couldn't find an apache-style error log anywhere. Here's the snippet I put at the top of swish.cgi:
use CGI::Carp qw(fatalsToBrowser warningsToBrowser carpout);
#use CGI::Carp qw(carpout);
my $log = 'C:\path\to\errlog-web.txt';
open(LOG, ">$log") or
croak("Unable to open $log for writing: $!");
ALSO, I had to modify doc2txt.pm and pdf2xml.pm (at least... maybe more I don't remember) to quote the filename when they use system() or backticks (``), since some directory names have spaces. E.g. in doc2txt.pm:
< my $content = `$catdoc "$file"`;
instead of > my $content = `$catdoc $file`;
Windows doesn't allow doublequotes in file or directory names so that should be an OK way to do it even though it doesn't escape anything. It might be better to add a more robust argument escaping method to prevent filenames with special characters from doing unexpected things (or better would be to not use backticks to avoid the shell completely). In some cases the current code could be a security hole on unix, because a user whose documents are indexed could execute arbitrary code as the webserver user by naming a document 'haha; rm -rf /; youlose.doc' or whatever. If the webserver user has sufficiently few rights it shouldn't be possible to cause a lot of damage that way though.
I just wanted to share my hard-won experience with the archive, and maybe these hints/changes could make it into at least the Windows release. Thanks to all you who have developed swish!
nathan vonnahme : programmer/analyst : email@example.com
fairbanks memorial hospital/denali center : 1650 cowles st, fairbanks alaska 99701
Received on Tue Jun 10 17:36:17 2003