Skip to main content.
home | support | download

Back to List Archive

swish-e binary vs. the swish-e library

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sat Mar 02 2002 - 16:10:44 GMT
[I'm cc'ing the list]

At 11:25 PM 3/1/2002 -0500, Fred Toth wrote:

>At 07:18 PM 3/1/02 -0800, Bill Moseley wrote:
>>I always ask this:  Why are you using the SWISHE.pm library instead of
>>running the binary?
>
>It seemed like the shortest path to getting the job done, I guess.
>Performance should be better, yes? Since there's no need to
>exec another process.

Yes, it should have better performance.  From what I've heard, forking used
to be very expensive, but now with most operating systems forking is not
too bad.
I think you would see much better performance gain by using a technology
like mod_perl.  

When the swish-e library first became available I ran some benchmarks (that
were of questionable design).  I wrote a simple mod_perl application that
displayed search results from swish, and then I used the Apache Benchmark
test program "ab" and loaded the server with requests.  (This was on Linux
with a PIII-555Mhz 128M and IDE drives.)  I compared running swish as a
binary (forking) and using the swish library.  I also compared to a CGI
program not running under mod_perl, but still using the swish library.

mod_perl made a huge difference in requests per second over plain CGI, but
running the library didn't help that much.  What running the library did do
is make the Apache processes a lot bigger, since memory allocated by perl
in those processes in not released back to the system.  I just didn't think
the gain in Apache process size was worth the little bit of gain in speed
from using the library.

Now, this was on an unloaded system, and I was making the same swish query
over and over, so what was happening when using the swish-e binary was that
Linux was probably caching both the swish-e binary and the index file in
RAM, so it was basically like running the library.

Now, if running on a busy system you might not get that caching.  But I
thought that it was more likely to get that caching from the OS if my
Apache processes were smaller (after all there's 30+ of those so a little
unshared memory in Apache ends up big) and left more memory to the OS.

I wish others would benchmark this stuff, as I'm sure my tests were not
perfect (benchmarks rarely are).  I also discussed this on the mod_perl
list, because the common thinking there is *avoid forking at all costs*.

Here's the comment I made about this on the mod_perl list:

http://msgs.securepoint.com/cgi-bin/get/apache0010/57/1/1/1/2/1.html

>The search system I'm building replaces an existing one (Verity), and has
>to match it as closely as possible. For that reason, I couldn't use
>swish.cgi.

That's ok.  swish.cgi can be template based so you can make the output look
like anything you want, but it's also bulky due to feature bloat.

>SWISHE.pm looked like the quickest way to get there.
>The other options were hand-rolling my own perl wrapper, and the
>CPAN module that I haven't even looked at.

You mean like Template-Toolkit?  Or the SWISH modules?  If
Template-Toolkit, I'd recommend it -- especially if building a new site.
It's very nice.
If you mean the SWISH modules, then hold off.  I haven't touched those in a
whlie, and swish has been in development (for a year now ;).  That SWISH
module (set of modules) were to provide an abstraction layer for accessing
swish.  The idea is the same program could use the swish-e binary, swish-e
library, or (the yet to be invented) swish-e server.

>Is there any reason why I shouldn't be using SWISHE.pm?

No.  It will be good to get more testing on it, and help bring it up to
date with what the binary can do.   But if running under something like
mod_perl you will use more memory.  That's only a problem if you use up all
your RAM.

My conclusion was that using the library didn't add much speed, but adding
mod_perl did, yet I didn't want the memory usage the library added.

Hope this helps.  And if you do any benchmarking please post the results.



Bill Moseley
mailto:moseley@hank.org
Received on Sat Mar 2 16:11:09 2002