Skip to main content.
home | support | download

Back to List Archive

Re: more out of memory fun - incrimental failed

From: Brad Miele <bmiele(at)not-real.ipnstock.com>
Date: Fri Oct 20 2006 - 10:31:50 GMT
Hi,

So, when we last left our intrepid (albeit somewhat clueless) hero, he was 
running incremental + economy mode on his 900K++ records. Here is where we 
ended up:

Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 2,442,762 words alphabetically
Writing header ...
Writing index entries ...
   Writing word data:  79%err: Ran out of memory (could not allocate 
4296816 more bytes)!
.

soooo cloooooose...:(

this is good in a way from my perspective in that unlike 2.4.3, it is not 
a segfault, but still leaves me with not much else helpful to offer. I 
will note that smaller builds of the index seemed ok, but we are talking 
way smaller, I could conceivably do testing on say 50% and if that passed 
move up (or down if it fails) until i hit a magic number. I can also run 
gdb, but would need some guidance there (happy to try). Finally, I can 
bump the memory allotment in the kernel (as David mentioned that might be 
a bottleneck) to try and eke passed this hump. Of course, that would leave 
me having to do so again the next time memory hindered the build.

I can also, of course, be happy with non-incremental and get on with my 
life.

Brad
---------------------
Brad Miele
VP Technology
IPNStock.com
866 476 7862 x902
bmiele@ipnstock.com

On Thu, 19 Oct 2006, Bill Moseley wrote:

> On Thu, Oct 19, 2006 at 10:19:28PM -0400, Brad Miele wrote:
>> one question about debugging with gdb, what do i do :)? Sorry, i know how
>> to do gdb swish-e, and then run <switches and whatnot> but what do i do
>> after the crash to get more info?
>
> I'm way rusty.
>
> Depends on how hard it crashes.  Basically, you get a backtrace (bt)
> where it segfaults.  Then you look back though and try and see what
> was happening where and if it makes sense.  Normally it doesn't.  If
> it crashes hard then you may not even get a backtrace that makes any
> sense.  From there you set breakpoints and watch variables to try and
> track down the problem.  At one point I knew most of the indexing
> code, but I would need to completely relearn it to be able to make
> quick work of tracking down a segfault.  The bummer is in your case it
> takes so long to happen.
>
>> finally, why do i need to use -e when i have so many resources? when
>> swish-e gave that out of memory error, i still had over 2G totally free
>> via top.
>
> 32bit limit in swish?  I doubt there's correct integer overflow
> detection.
>
> Swish uses hash tables and the larger they get the slower access to
> the table is.  Remember, swish was designed for indexing thousands or
> tens of thousands of files.  It's very fast at that.  The trade-off is
> it's not that scalable.
>
> I generated a million random docs once and -e was much slower at first
> but kept running at a reasonably steady pace, and without -e it was
> way faster for the first 100K files or so and then started slowing
> down as the hash tables filled.  -e ended up being faster.
>
>
>
> -- 
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
>   http://swish-e.org/Discussion/
>
> Help with Swish-e:
>   http://swish-e.org/current/docs
>   swish-e@sunsite.berkeley.edu
>
>
>
Received on Fri Oct 20 03:31:56 2006