Skip to main content.
home | support | download

Back to List Archive

Re: Segmentation Fault w/ Very Long Words

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Sep 02 2002 - 17:00:06 GMT
At 11:22 AM 09/01/02 -0700, Rodney Barnett wrote:
>I just ran into a segmentation fault while using the prog method.  I tracked
>the trigger down to a very long "word" (in this case, it was roughly 429,000
>characters long).  I certainly don't want that "word" to be indexed, but the
>program shouldn't crash either.

I haven't been able to duplicate the problem.  I added a printf statement
and wrote a -S prog program to generate a word 429,000 chars long and
another 2,000,000 chars long and I see this:

> perl prog.pl | ./swish-e -S prog -i stdin 
Indexing Data Source: "External-Program"
Indexing "stdin"
word is too long [2000000 bytes].  Skipping
word is too long [429000 bytes].  Skipping
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
.

Perhaps you have some characters that are causing problem and it's not just
the length of the data.

>I was first using swish-e from a snapshot from a week or two ago, but
>switched to today's CVS and the problem's still there.

Can you send me a test case off-list?  Something the example above?

>I'm not using libxml2 and I have not changed the MaxWordLimit parameter from
>its default.
>
>Are there any other details that are important?

A backtrace from gdb might give some clues.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Mon Sep 2 17:03:46 2002