I hope this can help some of you to discover the reason of fault.
This is my experience about indexing a large number of documents with
Swish-e windows version 2.4.2.
I use swish-e 2.4.2 Windows version and I get a kind of error that seems to
be related with the faults you mention. (And it happens from version 2.4.0).
In Windows, I get a small error window stating:
"Invalid operation. The program will be terminated" + (Accept only
Swish-e terminates without giving any more info about the fault.
I build indexes from thousands of documents.
I feed swish-e with plain text documents from a self-written external
program by redirecting the "standard output" to swish-e.
It does not seem to be a problem linked to a certain document. When the
fault appears near a document, I try to index only that document and
everything works fine. I remove the mentioned document from original list
and I try to index again. Then the fault occurs at other time in other place
of the list. No reason about why the error appears maybe earlier, maybe
It is more a "random" error related to memory handling.
Either I use "save memory" parameter of swish-e (-e) either I do not use
it, after a while (sometimes after 5 minutes, other after 45 minutes of
Using -e parameter it seems the error appeareance is delayed but finally it
I have seen a maximum of 376 temporary files in temp directory (TMPDIR, TMP
or TEMP) used by swish-e.
Maybe giving a new posibility to use more temp files will help ?
I got success in indexing normally a set of about 23500 documents with a
total of 254300 words that generates a 80 Mb index file.
When I try to index a set of "big" documents (ie. average size > 250000
characters) I get the undefined swish error.
I suspect the error occurs in swish-e related when reading LARGE AMOUNT of
data through the "standard input", when it receives data generated by an
external program that uses "redirection" to feed its "standard output" to
A sample of this swish-e use can be like this:
MyProgram.exe | swish-e -e -S prog -i stdin -c index.conf -f Index.idx
Trying to imagine a problem in "redirection pipe" when feeding large data or
linked to "redirection speed", I tried another approach.
First, I wrote a big file from "MyProgram" like this:
MyProgram.exe > WholeData.txt
and then I feed swish-e with the content of the file:
Type WholeData | swish-e -e -S prog -i stdin -c index.conf -f
(Eiher with -e or without it, the fault comes, at different points)
I understand that using only memory for indexing can be faster but it always
will have a limit (installed memory RAM) but using temporary files the max
limit will be higher. I still believe the 376 temp files I've seen is the
current max temp files allowed, so this could produce the "ghost error" we
Remember, all I refer it is related to Windows version of swish-e but I am
afraid the error in other platforms has the same nature.
Thanks to everyone.
----- Original Message -----
From: "Bill Moseley" <email@example.com>
To: "Multiple recipients of list" <firstname.lastname@example.org>
Sent: Tuesday, April 20, 2004 3:10 PM
Subject: [SWISH-E] Re: Segmentation fault while indexing
> Would it be possible to index under gdb and they try and get a
> backtrace? If we are lucky that might show the problem.
> The other standard suggestion is try and see if there is a small set of
> documents that will demonstrate the problem.
> Bill Moseley
Received on Wed Apr 21 12:26:34 2004