Skip to main content.
home | support | download

Back to List Archive

Re: swish-e 2.4.3 windows 2003 iis success!

From: Revillini, James <JRevillini(at)not-real.txcc.commnet.edu>
Date: Tue Jun 21 2005 - 21:09:36 GMT
> -----Original Message-----
> From: swish-e@sunsite3.berkeley.edu [mailto:swish-e@sunsite3.berkeley.edu]
> On Behalf Of David L Norris
> Sent: Tuesday, June 21, 2005 2:57 PM
> To: Multiple recipients of list
> Subject: [SWISH-E] Re: swish-e 2.4.3 windows 2003 iis success!
> 
> On Tue, 2005-06-21 at 10:16 -0700, Revillini, James wrote:
> > I hope this helps someone out there get started.  It's great once you
> > get it working, but I ran into some very weird windows-related stumbling
> > blocks along the way and had to read through many, many email archive
> > issues that did not even appear to be on the same topic in order to
> > configure this correctly on Windows 2003.  I hope to write another
> > message with some of the errors I ran into and how to correct them.
> 
> It would be immensely helpful if you can send the list of problems and
> solutions directly to me as well as to the list.  Bugs can be fixed and
> I could at least add any quirky behavior to the documentation.
> 
> Most of the bug reports I've received for the Windows builds are in the
> form "Swish not work windows help urgent fix now." instead of an
> explanation of what's wrong.  And I suspect many Windows users don't
> even bother to report problems on the assumption it wouldn't be fixed
> anyway.
> 
> --
>  David Norris
>   http://www.webaugur.com/dave/
>   ICQ - 412039
> 
> 
> 
> --------------------------------------------------------------------------
> -----
>  WARNING: The sender of this email is different from the email address
>  shown in the headers. The real sender of this message is: swish-
> e@sunsite3.berkeley.edu
>  If you want to add this sender to your Safe or Blocked Senders List, you
>  would need to add swish-e@sunsite3.berkeley.edu
> --------------------------------------------------------------------------
> -----

Here is a list of issues I ran into while working in a windows 2003 box with swish-e using a filesystem index of a network folder containing many pdfs and word docs.  I'm not saying that these items aren't covered in the documentation somewhere - just that they got me and maybe this will help if they got you too.  The list is not ordered in any way.

1. did not accept a network path to index
Index config file contained only the IndexDir directive
Tried: \\fileservername\folder
Error: Warning: Invalid path '//fileservername/folder': No such file or directory
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
.

Tried: \\fileservername\folder\
Error: Warning: Invalid path '//fileservername/folder': No such file or directory
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
.

*at this point I decided to try a subdirectory with only like 9 files

Tried: IndexDir \\fileservername\folder\subdirectory
NO ERRORS

Tried: IndexDir \\fileservername\folder\subdirectory\
NO ERRORS

*I then looked into the original directory for anything weird and noticed a file called "ICON" - notice the "N" is not really an "N" - it is some weird character.  My suspicion is that this was screwing it up.  This is a hidden file of 0kb, by the way.  I have no idea what the hell it does or who put it there.  I'm at a college so I'm not going to touch it.

*I also mapped drive z: to \\fileservername\folder\ and then proceeded to try the following:

Tried: IndexDir z:\
Error: Warning: Failed to open: 'z://ICON?': Invalid argument
*but OK other than that - it indexed 2505 files.

Tried: IndexDir z:/
Error: Warning: Failed to open: 'z://ICON?': Invalid argument
*but OK other than that - it indexed 2505 files.

2. could not have backslashes in the IndexDir argument.  Even though it looks like everything is going to be ok given the above tests, having backslashes in the index dir caused problems when there was more than 1 line in the indexer config file.  Things like this would happen:

E:\swish-e>swish-e -c swish-e.config -fs
Indexing Data Source: "File-System"
Indexing "z:IndexFile"

Warning: Invalid path 'z:IndexFile': No such file or directory
Indexing "e:/search/index.swish-e"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 108 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
108 unique words indexed.
4 properties sorted.
1 file indexed.  410,898 total bytes.  194 total words.
Elapsed time: 00:00:02 CPU time: 00:00:01
Indexing done!

At first glance, it looks ok, but it's not - there should have been 2505 files indexed, but it says 1 file was indexed.  This is understandable as the docs advise against using backslashes in most situations.

3. 404 errors - assume you have a windows 2003 box without activeperl installed.  Assume you go and install it (to e:\perl\), then you install swish-e (to e:\swish-e\).  Assume you copy swish.cgi, index.swish-e and index.swish-e.prop from e:\swish-e\ to your search directory (I use e:\search\).  Do all the testing and all that (issues with testing will be brought up later).  Assume everything works.  Great.  Now you're ready to try the real browser test.  (Assume you've created a website in IIS called "search" which uses port 85)

Point your browser to http://webservername:85/swish.cgi

You probably got a 404 error.

First: you need to ALLOW the Perl CGI Extension and the Perl ISAPI Extension in Web Service Extensions, a new and not so well known addition in IIS6.
 - open iis
 - expand the server node
 - click on web service extensions
 - select Perl CGI Filter and click allow if it is not already "allowed"
 - select Perl ISAPI Filter and click allow if it is not already "allowed"

Restart IIS (Start>Run>iisreset -restart)

Point your browser to http://webservername:85/swish.cgi

You probably got a 404 error. (again!)

Second: you probably (as of ActivePerl 5.8.7.813) need to add the .cgi extension to your website; activeperl only installs .pl and .plx.
- open iis
 - expand the server node
 - expand the Websites node
 - right click the "search" website>properties
 - click the home directory tab
 - click configuration...
 - click the "add" button
 - path to executable is e:\perl\bin\perl.exe "%s" %s
 - extension is .cgi
 - select "limit verbs to" and enter POST,HEAD,GET

*NOTE: some savvy users are wondering why I didn't point to e:\perl\bin\perlis.dll to run perl as an ISAPI extension for performance reasons.  my answer: didn't work for me.  I'll get into that later.

Point your browser to http://webservername:85/swish.cgi

SUCCESS!  (but not for long)

4. Assume you leave off at the end of the issue noted above.  You type a word.  You hit Search.  You get: "service currently unavailable."  No matter what you type, you get "service currently unavailable."

I then noticed that swish.cgi points to the binary in the local folder (I'm assuming) on approximately line 144:

        swish_binary    => 'swish-e.exe',  # Location of swish-e binary

TRIED: copying swish-e.exe into the e:\search\ folder.  
RESULT: This got rid of the "service currently unavailable" error, but no search results show up either.
ALSO: if you inspect e:\perl\bin\perlis-err.log, you will find entries like:
*** 'e:\search\swish.pl' error message at: 2005/06/21 09:33:23
Use of uninitialized value in pattern match (m//) at E:/perl/lib/IO/Handle.pm line 348.
Use of uninitialized value in concatenation (.) or string at E:/perl/lib/IO/Handle.pm line 358.
e:\example\swish.pl aborted: open2: Can't call method "close" on an undefined value at E:/perl/lib/IPC/Open3.pm line 341.

TRIED: changing the swish_binary path to the full path, thus:

        swish_binary    => 'e:\swish-e\swish-e.exe',  # Location of swish-e binary
RESULT: it worked

I finally figured that one out because when I issued the command swish-e -V at the e:\search> prompt, it told me the xmlparse.dll was missing, and I started putting things together.  I wonder if swish-e should go into the path environment variable on windows machines.  I don' know yet if that would work so that the default swish.cgi worked out of box.

5. links to files don't work.  This is why I ended up using the line

ReplaceRules replace "z\:/" "file\://fileservername/folder/"

In my index config.  I was indexing a network drive.  Not everyone has the same drive mappings, so I couldn't just leave it as z:\...

But just so you know:
TRYING: 
IndexDir //fileservername/folder/
IndexFile e:/search/index.swish-e

The search works fine, but the links point to http://fileservername/folder/filename.ext
(we don't need http:// unless we actually configure the server to handle requests through http - let's assume we have a plain-jane file share)

TRYING: 
IndexDir z:/   #mapped to //fileservername/folder/
IndexFile e:/search/index.swish-e

The search works fine, but the links point to file://z:/fileservername/folder/filename.ext
(this doesn't seem to work at all, but then again it may just not work because the directories and files have spaces in them, which turn into %20 in the code that swish.cgi outputs)

TRYING: 
IndexDir z:/   #mapped to //fileservername/folder/
IndexFile e:/search/index.swish-e
ReplaceRules replace "z:/" "file://fileservername/folder/"

*you were right, Bill, no need to escape the colons - my mistake

The search works fine, and internet explorer can follow the links - firefox 1.0.3 cannot.  I still can't figure out the firefox bug.  I can tell you that in firefox, the link show in the status bar is:

6. m$ word docs aren't indexing properly.  Unfortunately, I just noticed this and have not researched it at all.  I just ran the index again on a subdirectory and noticed that all word docs are showing that only 1 word gets indexed.  Here's the config file:

IndexReport 3
ParserWarnLevel 1
IndexDir z:/subdirectory/
IndexFile e:/search/index.swish-e
ReplaceRules replace "z:/" "file://fileservername/folder/"

And here is an example I'm seeing at the command prompt:

Checking dir "z:/subdirectory"...

In dir "z:/subdirectory/subsub":
  Word doc 1.doc - Using DEFAULT (HTML2) parser -  (1 words)
  Word doc 2.doc - Using DEFAULT (HTML2) parser -  (1 words)

*any help would be appreciated on this one

OK, that's all for now. I'll write in again if I think of anything else (and can reproduce it.  Some of the issues I remember running into I can't reproduce anymore).

- Jim
Received on Tue Jun 21 14:09:38 2005