Skip to main content.
home | support | download

Back to List Archive

swish-e search CGI script

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Dec 06 2001 - 14:48:13 GMT
I added to CVS last night a first edit of the rewrite of the swish.cgi
script.  It should be in swish-daily now.  For now, it's called swish3.cgi,
but it will replace swish.cgi and swish2.cgi.

I probably won't have an more time this week to work on it, but I'd like to
get feedback, and maybe some help.  If you are using the existing swish.cgi
script it would be great if you could offer some time and test.

I've only tested the script under CGI, but it's suppose to work under
mod_perl, too.

I added new highlighting code, so it now highlights phrases, and (I think)
deals correctly with stopwords and stemming.

I added a few features that people asked for.  One, is a way to select from
a list of index files, and the other is a way to limit searches to some set
of values of a metaname.  That works well with something like ExtractPath
for limiting searches part of the document tree.

I added the DateRanges module to the distribution, although I really wanted
to rewrite it first.  Time, time, and less time.

It still uses the fork method for running the script.  I plan on making it
work on Windows at some point, and maybe trying to make it use the C
library, too.  But that will have to wait for now.

My web design skills are poor, so if anyone has suggestions on making it
look more, what?, professional, then please do!

I made the script modular.  I added a new directory called example/modules.
 As I said before, people didn't like the original script where one had to
install modules before use.  I went to a single script, but that was a pain
to maintain.  

So this time I just placed the modules in a directory, and it's just a
matter of setting a "use lib" path correctly in the script.  That shouldn't
be too hard or too restrictive.

The modules handle the highlighting and output generation functions.  The
modules can be selected at run time -- so, for example, a mod_perl script
could use the same swish.cgi "module" loaded into the server, but use
config settings to select what generates the output.  Really, the modules
are more for me, so I don't have more than one script to maintain, yet I
can use it in my projects (which use HTML::Template and Template-Toolkit)
unmodified.

The hope, of course, is that with it being a bit more modular that swish
users will contribute and make it better.

There's currently three highlighting modules.  It's a speed vs. accuracy
trade off.  The fastest one does a simple regular expression (similar to
what was posted by Mark Kennedy on Nov 11th.)  But it doesn't show the
matches in context, and actually slows down on searches that display a lot
of hits.

The next one is basically the previous highlighting swish.cgi code that did
context output, but didn't highlight phrases.

The third is the new one, that is reasonably accurate, but the speed
difference is noticeable on my machine.  It's not bad, but it's not as fast
as I'd like.

My hope is that someone smart will optimize the code and make it both
accurate and fast.  Might need a perl extension written in xs or Inline.

The output generation is by modules.  I haven't got anything working except
the basic text output.  I'll build modules for HTML::Template and
Template-Toolkit, soon.  It would be fantastic if someone took the existing
DefaultTempalte.pm module and made it look real fancy ;)

Thanks,


Bill Moseley
mailto:moseley@hank.org
Received on Thu Dec 6 14:53:39 2001