Skip to main content.
home | support | download

Back to List Archive

Re: Severe performance problems

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Sep 12 2002 - 03:29:20 GMT
At 04:23 PM 09/11/02 -0700, Paul Borghese wrote:
>Ok, I think I have archives for a mailing list I run working except for
>one problem.  I have indexed two lists we will call list A and list B.
>List A is significantly bigger then list B.  The indexes were created
>with almost identical configurations.
>
>When I search list A specifying a date of "Last Month" (or any of the
>options) it return the results relatively quick.  Doing the same search
>on List B causes the CPU utilization to skyrocket and eventually the
>script will time-out.

Is it a problem with the script or with Swish-e?  Can you test from the
command line?

>They are both mailing list archives running MHonArc and almost identical
>configurations when creating the archives.

It's that "almost identical" thing that always catches me.

Looking at it from the Swish-e side:

Is it possible that list B does not have a presorted index for the last
modified date?   Are you using PreSortedIndex in B's config file?

For example here's a index with 24735 files.

$ ./swish-e -w not dkdkd -m 3 -L swishlastmodified '>=' 930000000 -f index1
# SWISH format: 2.2rc1-dev
# Search words: not dkdkd
# Number of hits: 16803
# Search time: 0.055 seconds
# Run time: 0.074 seconds

Reasonably fast for 16,000 results.

Now, if I index the same data but without the pre-sorted indexes:

$ ./swish-e -w not dkdkd -m 3 -L swishlastmodified '>=' 930000000 -f index2
# SWISH format: 2.2rc1-dev
# Search words: not dkdkd
# Number of hits: 16803
# Search time: 0.135 seconds
# Run time: 0.155 seconds

Longer (ok, but not by much).

>From the script side:

If it's a perl script you can install a $SIG{HUP} handler and when the
script goes crazy you can send it kill -HUP and have the script print a
back trace.  That will show where it's hanging.  The othe debugging tool is
strace and that will show you if the script is making system calls (such as
IO).

>Any ideas?

Not really.  It's helpful to see your config files, of course.  You might
have found a bug in swish that I can't reproduce.  I may need your A and B
indexes to try on my machines.  With CPU usage jumping I'd suspect an odd bug.

Is it possible you are running out of memory?

If you do not have a presorted index for the property your are limiting on
then for each result swish has to fetch the property from the .prop file
instead of using a fast lookup-table to see if the result is in the range
selected.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Thu Sep 12 03:32:51 2002