Skip to main content.
home | support | download

Back to List Archive

Re: Segmentation fault while indexing with"StoreDescription"

From: SANCHEZ Juanjo <jjsanchez(at)not-real.datadiar.com>
Date: Wed Apr 21 2004 - 19:26:34 GMT
Hi, everyone

I hope this can help some of you to discover the reason of fault.
This is my experience about indexing a large number of documents with
Swish-e windows version 2.4.2.

I use swish-e 2.4.2 Windows version and I get a kind of error that seems to
be related with the faults you mention. (And it happens from version 2.4.0).

In Windows, I get a small error window stating:
    "Invalid operation. The program will be terminated" + (Accept only
button)
Swish-e terminates without giving any more info about the fault.

I build indexes from thousands of documents.
I feed swish-e with plain text documents from a self-written external
program by redirecting the "standard output" to swish-e.

It does not seem to be a problem linked to a certain document. When the
fault appears near a document, I try to index only that document and
everything works fine. I remove the mentioned document from original list
and I try to index again. Then the fault occurs at other time in other place
of the list. No reason about why the error appears maybe earlier, maybe
later

It is more a "random" error related to memory handling.

Either I use "save memory" parameter of swish-e  (-e) either I do not use
it, after a while (sometimes after 5 minutes, other after 45 minutes of
indexing process).
Using -e parameter it seems the error appeareance is delayed but finally it
comes.
I have seen a maximum of 376 temporary files in temp directory (TMPDIR, TMP
or TEMP) used by swish-e.
Maybe giving a new posibility to use more temp files will help ?

I got success  in indexing normally a set of about 23500 documents with a
total of 254300 words that generates a 80 Mb index file.

When I try to index a set of "big" documents (ie. average size > 250000
characters) I get the undefined swish error.

I suspect the error occurs in swish-e related when reading LARGE AMOUNT of
data through the "standard input", when it receives data generated by an
external program that uses "redirection" to feed its "standard output" to
swish-e.

A sample of this swish-e use can be like this:

    MyProgram.exe | swish-e -e -S prog -i stdin -c index.conf  -f  Index.idx


Trying to imagine a problem in "redirection pipe" when feeding large data or
linked to "redirection speed", I tried another approach.
First, I wrote a big file from "MyProgram" like this:

    MyProgram.exe >  WholeData.txt

and then I feed swish-e with the content of the file:

    Type WholeData | swish-e -e -S prog -i stdin -c index.conf  -f
Index.idx

(Eiher with  -e  or without it, the fault comes, at different points)

I understand that using only memory for indexing can be faster but it always
will have a limit (installed memory RAM) but using temporary files the max
limit will be higher. I still believe the 376 temp files I've seen is the
current max temp files allowed, so this could produce the "ghost error" we
have.

Remember, all I refer it is related to Windows version of swish-e but I am
afraid the error in other platforms has the same nature.

Thanks to everyone.
Juan-Jose Sanchez




----- Original Message ----- 
From: "Bill Moseley" <moseley@hank.org>
To: "Multiple recipients of list" <swish-e@sunsite.berkeley.edu>
Sent: Tuesday, April 20, 2004 3:10 PM
Subject: [SWISH-E] Re: Segmentation fault while indexing
with"StoreDescription"


> Would it be possible to index under gdb and they try and get a
> backtrace?  If we are lucky that might show the problem.
>
> The other standard suggestion is try and see if there is a small set of
> documents that will demonstrate the problem.
>
>
> -- 
> Bill Moseley
> moseley@hank.org
>
>
Received on Wed Apr 21 12:26:34 2004