So, when we last left our intrepid (albeit somewhat clueless) hero, he was
running incremental + economy mode on his 900K++ records. Here is where we
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 2,442,762 words alphabetically
Writing header ...
Writing index entries ...
Writing word data: 79%err: Ran out of memory (could not allocate
4296816 more bytes)!
this is good in a way from my perspective in that unlike 2.4.3, it is not
a segfault, but still leaves me with not much else helpful to offer. I
will note that smaller builds of the index seemed ok, but we are talking
way smaller, I could conceivably do testing on say 50% and if that passed
move up (or down if it fails) until i hit a magic number. I can also run
gdb, but would need some guidance there (happy to try). Finally, I can
bump the memory allotment in the kernel (as David mentioned that might be
a bottleneck) to try and eke passed this hump. Of course, that would leave
me having to do so again the next time memory hindered the build.
I can also, of course, be happy with non-incremental and get on with my
866 476 7862 x902
On Thu, 19 Oct 2006, Bill Moseley wrote:
> On Thu, Oct 19, 2006 at 10:19:28PM -0400, Brad Miele wrote:
>> one question about debugging with gdb, what do i do :)? Sorry, i know how
>> to do gdb swish-e, and then run <switches and whatnot> but what do i do
>> after the crash to get more info?
> I'm way rusty.
> Depends on how hard it crashes. Basically, you get a backtrace (bt)
> where it segfaults. Then you look back though and try and see what
> was happening where and if it makes sense. Normally it doesn't. If
> it crashes hard then you may not even get a backtrace that makes any
> sense. From there you set breakpoints and watch variables to try and
> track down the problem. At one point I knew most of the indexing
> code, but I would need to completely relearn it to be able to make
> quick work of tracking down a segfault. The bummer is in your case it
> takes so long to happen.
>> finally, why do i need to use -e when i have so many resources? when
>> swish-e gave that out of memory error, i still had over 2G totally free
>> via top.
> 32bit limit in swish? I doubt there's correct integer overflow
> Swish uses hash tables and the larger they get the slower access to
> the table is. Remember, swish was designed for indexing thousands or
> tens of thousands of files. It's very fast at that. The trade-off is
> it's not that scalable.
> I generated a million random docs once and -e was much slower at first
> but kept running at a reasonably steady pace, and without -e it was
> way faster for the first 100K files or so and then started slowing
> down as the hash tables filled. -e ended up being faster.
> Bill Moseley
> Unsubscribe from or help with the swish-e list:
> Help with Swish-e:
Received on Fri Oct 20 03:31:56 2006