At 04:23 PM 09/11/02 -0700, Paul Borghese wrote:
>Ok, I think I have archives for a mailing list I run working except for
>one problem. I have indexed two lists we will call list A and list B.
>List A is significantly bigger then list B. The indexes were created
>with almost identical configurations.
>
>When I search list A specifying a date of "Last Month" (or any of the
>options) it return the results relatively quick. Doing the same search
>on List B causes the CPU utilization to skyrocket and eventually the
>script will time-out.
Is it a problem with the script or with Swish-e? Can you test from the
command line?
>They are both mailing list archives running MHonArc and almost identical
>configurations when creating the archives.
It's that "almost identical" thing that always catches me.
Looking at it from the Swish-e side:
Is it possible that list B does not have a presorted index for the last
modified date? Are you using PreSortedIndex in B's config file?
For example here's a index with 24735 files.
$ ./swish-e -w not dkdkd -m 3 -L swishlastmodified '>=' 930000000 -f index1
# SWISH format: 2.2rc1-dev
# Search words: not dkdkd
# Number of hits: 16803
# Search time: 0.055 seconds
# Run time: 0.074 seconds
Reasonably fast for 16,000 results.
Now, if I index the same data but without the pre-sorted indexes:
$ ./swish-e -w not dkdkd -m 3 -L swishlastmodified '>=' 930000000 -f index2
# SWISH format: 2.2rc1-dev
# Search words: not dkdkd
# Number of hits: 16803
# Search time: 0.135 seconds
# Run time: 0.155 seconds
Longer (ok, but not by much).
>From the script side:
If it's a perl script you can install a $SIG{HUP} handler and when the
script goes crazy you can send it kill -HUP and have the script print a
back trace. That will show where it's hanging. The othe debugging tool is
strace and that will show you if the script is making system calls (such as
IO).
>Any ideas?
Not really. It's helpful to see your config files, of course. You might
have found a bug in swish that I can't reproduce. I may need your A and B
indexes to try on my machines. With CPU usage jumping I'd suspect an odd bug.
Is it possible you are running out of memory?
If you do not have a presorted index for the property your are limiting on
then for each result swish has to fetch the property from the .prop file
instead of using a fast lookup-table to see if the result is in the range
selected.
--
Bill Moseley
mailto:moseley@hank.org
Received on Thu Sep 12 03:32:51 2002