Skip to main content.
home | support | download

Back to List Archive

Re: Segmentation Fault w/ Very Long Words

From: Rodney Barnett <rbarnett(at)not-real.neuromics.com>
Date: Tue Sep 03 2002 - 11:48:32 GMT
For the list archives, this problem was limited to the included expat
library and has now been fixed in CVS.  Thanks, Bill!

Rodney

-----Original Message-----
From: swish-e@sunsite.berkeley.edu
[mailto:swish-e@sunsite.berkeley.edu]On Behalf Of Bill Moseley
Sent: Monday, September 02, 2002 12:00 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Segmentation Fault w/ Very Long Words


At 11:22 AM 09/01/02 -0700, Rodney Barnett wrote:
>I just ran into a segmentation fault while using the prog method.  I
tracked
>the trigger down to a very long "word" (in this case, it was roughly
429,000
>characters long).  I certainly don't want that "word" to be indexed, but
the
>program shouldn't crash either.

I haven't been able to duplicate the problem.  I added a printf statement
and wrote a -S prog program to generate a word 429,000 chars long and
another 2,000,000 chars long and I see this:

> perl prog.pl | ./swish-e -S prog -i stdin
Indexing Data Source: "External-Program"
Indexing "stdin"
word is too long [2000000 bytes].  Skipping
word is too long [429000 bytes].  Skipping
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
.

Perhaps you have some characters that are causing problem and it's not just
the length of the data.

>I was first using swish-e from a snapshot from a week or two ago, but
>switched to today's CVS and the problem's still there.

Can you send me a test case off-list?  Something the example above?

>I'm not using libxml2 and I have not changed the MaxWordLimit parameter
from
>its default.
>
>Are there any other details that are important?

A backtrace from gdb might give some clues.


--
Bill Moseley
mailto:moseley@hank.org
Received on Tue Sep 3 11:52:07 2002