Skip to main content.
home | support | download

Back to List Archive

Re: Segmentation fault when processing large file

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Mar 16 2004 - 19:20:45 GMT
On Tue, Mar 16, 2004 at 10:03:44AM -0800, Steve Harris wrote:
> All I get from a backtrace is:
> 
> #0  0x400e04a9 in compress3 (num=2139062143, 

Yep, buffer overflow.

I'll have to defer to Jose for this problem.

What's the point of indexing such a large file?


>     buffer=0x487ab00f
> "\202�p\001��|\203�+\201\234�dBC��k\201\216�c��(\237�s�\005\235�x\002\201\221-\004\203B\002\212h\004�W\002�w\002�z\002\204:\004\205g\005\234+\003�N\002\216p\002\210\032\002�\006\003\201h\001�r\003�")
> at compress.c:140
> 140             _s[_i++] = _r & 127;
> #1  0x7f7f7f7f in ?? ()
> Cannot access memory at address 0x7f7f7f7f
> 
> The file its processing is quite large:
> $ wc /raid/swh/lit_index/segv.lit 
> 5065943 9424230 50550321 /raid/swh/lit_index/segv.lit
> 
> and contains some 8bit characters, but if I run it through sort | uniq it
> doesn't cause problems. Its fairly simple file, with one phrase per line,
> longest line is 255 characters.
> 
> There are a few thousand similar files in the directory tree, that parse
> fine, but this is by far the largest. It doesnt appear to matter at what
> position it appears in the parse order.
> 
> I've made the file available at http://triplestore.aktors.org/~swh/segv.lit
> incase anyone wants to test it.
> 
> - Steve
> 

-- 
Bill Moseley
moseley@hank.org
Received on Tue Mar 16 11:20:46 2004