Skip to main content.
home | support | download

Back to List Archive

Re: cgi question / core

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Jul 23 2003 - 00:35:42 GMT
On Tue, Jul 22, 2003 at 02:29:39PM -0700, Aaron Bazar wrote:
> Hi Everybody!
> 
> One of the sites that I set Swish-e on is a success! The only problem is
> that it is killing my poor server. I am also finding a large core file in my
> web directory, basically daily. Does anybody have any experience with core
> files and the cgi script? Is there any way to lessen the load on the server
> of the cgi script, without using mod_perl (I could not get it to work
> properly)?

Well, you are going to get a longer answer than you probably wanted....

Yes.  A few things.

SpeedyCGI http://daemoninc.com/SpeedyCGI/ You don't need to be root to
install.  Many OSes have it as a package so you might be able to easily
install it (it was a apt-get on my Debian machine).

Yesterday I updated in cvs the swish.cgi script to work with SpeedyCGI 
-- which was only changing the shebang line at the top of the program 
plus I set it up to cache the configuration so it's only read upon 
script startup.

  http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/swishe/swish-e/example/swish.cgi.in.diff?r1=1.7&r2=1.8


Really, all you need is to install SpeedyCGI and change the top line to 
be:

  #!/usr/bin/speedy -w -- -t60 -M3

Let's see, where did I put that envelope.... oh ya, I ran Apache
Benchmark on localhost and doing a simple search without highlighting a
description I went from 3.7 requests/second to about 30/sec.  mod_perl
was about 47/sec so it's a bit faster but not that much.

Now that was also using the SWISH::API module instead of running the 
swish-e binary.  Here's the swish.cgi configuration I was using:

moseley@bumby:~/apache$ cat .swishcgi.conf
return {
    title => "This is my title $$ --",
    swish_index => '/home/moseley/apache/index.swish-e',
    use_library => 1,
};

So the numbers looked like this.  I was using Apache Benchmark as:

        ab -n 2000 -c 10 http://localhost/swish.cgi?query=install

and that was returning about 200 hits.

                use_library=0             use_library=1
              --------------------  ----------------------
  mod_cgi           3.7                      3.7
  mod_perl          8.9                     30.0
  SpeedyCGI         8.6                     26.0

Just running the search form page (i.e. without a query) mod_perl was 
doing 76 hits per second. ;)

Again, this was without any StoreDescription setting in the config.  In
previous tests the phrase highlighting has been the limiting factor. 
And also this is not using a templating system, rather just the default
perl output generation.

Oh, heck, let me try with a templating system:

With the default output setup:  Requests per second:    25.95 [#/sec] (mean)
          Template-Toolkit:     Requests per second:    21.58 [#/sec] (mean)
            HTML::Template:     Requests per second:    13.37 [#/sec] (mean)

I'm sure that HTML::Template can be better tuned.  I'm not caching the 
template object or using JIT.

Now, going back to the default output, but enabling StoreDescription, 
you can see how the term highlighting kills things:

   SWISH::PhraseHighlight:  Requests per second:    1.14 [#/sec] (mean)
   SWISH::DefaultHighlight: Requests per second:    1.56 [#/sec] (mean)
   SWISH::SimpleHighlight:  Requests per second:    7.29 [#/sec] (mean)
   NONE (shwing first 100 chars) Requests per second:    15.53 [#/sec] (mean)

Need to do some caching there.  Anyone want to write highlighting code 
in C?  Enough benchmarking.

You can specify with speedy how many "back end" processes to run, which 
can help prevent spiders from hammering your script so hard the load 
average goes through the roof.  Probably not as effective as a tuned 
mod_perl server since the requests are still being processed by apache, 
but should help.

The other thing you might want to try is changing the highlighting 
module.  The "PhraseHighlight" module is way slow -- it has to parse the 
entire description into words and then nested loops look for phrases to 
highlight word-by-word.

You will want to do your own benchmarking.  I'm sure mine are flawed in 
some way.  Try various settings of the -M SpeedyCGI parameter.

As for the core files, I often don't have much luck with them.  I think
you run gdb with something like (assuming it's swish-e that is core
dumping):

   gdb /usr/local/bin/swish-e /path/to/core

then use "bt" or "where" to show a backtrace.  You might modify
swish.cgi to write pid and the request and then another log entry at the
end of the request to a log file to try and get an idea of what request 
is causing the core.

Would hitting a resource limit (set with ulimit, for example) cause a 
core? 



-- 
Bill Moseley
moseley@hank.org
Received on Wed Jul 23 00:35:51 2003