swish-e may use a lot of space for temporary files (which is not documented,
and is done insecurely) and also fails to check (many, if not all) reads and
writes and is therefore prone to blunder on in spite of problems such as
large temporary files filling the target partition, and may loop consuming
CPU time (e.g. when attempting to search a truncated index, though I saw
that with swish and haven't re-tested with swish-e). Additionally, it does
nothing to ensure that the temporary files are deleted if it is interrupted.
In more detail:
After using swish for several years, I've just started looking at swish-e
and tried something which I'd not done with swish (which would quite
probably behave the same) - creating multiple indexes and merging them,
so I could offer the choice of searching everything or searching distinct
areas of the data specifically. This amounted to around 90 separate indexes
since our server hosts a large number of University clubs and societies and I
wanted them to be individually searchable...
Indexing the same data (in its entirety, single overall index) with swish
results in a 6MB index file. The separate indexes from swish-e total around
60 MB. That in itself was not a problem, but when I attempted to use swish-e
-M to combine them all into a single index, I hit a problem. In fact,
* swish-e filled the /var partition (because it wrote large temporary
files in /var/tmp, swish-e's use of which is not documented anywhere
that I could see).
* swish-e failed to notice that writes were failing, and carried on
trying to merge the indexes.
* 2-3 hours later I logged on and found that swish-e was still running,
still trying to merge indexes using the full partition for temporary
* I should have got paged soon after the partition filled up, but the
pager system's network interface wasn't working that evening... Can't
blame swish-e for that!]
The documentation mentions that memory use should be around half the total
size of the indexes to be merged, but no hint that it uses temporary files
as well. I was also surprised, when I checked with du after the problem, to
find that the many separate indexes totalled around 60MB when indexing
everything together with swish produces a 6MB index. Memory wouldn't have
been a problem, but the temporary files were, since /var is a relatively
small partition and had nothing like enough space.
There are a number of issues here:
* it looks like tmpnam() is used to invent names for the temporary files.
On Solaris 2, at least, that always uses /var/tmp (no way to redirect
it to somewhere with more space, as allowed by tempnam()). [As a
separate issue, creating temporary files in world-writable directories
(in particular, with more-or-less predictable names) is a security issue
unless care is taken when opening them, which swish-e does not do.]
Using tempnam() and documenting (a) the likely temporary file space
requirements, and (b) that sites might need to redirect the temporary
files by defining the TMPDIR environment variable, would help with
the file size/location issue; it wouldn't help with the security aspect
except to the extent that the files could be placed in a non-world-
writable directory. Safe use of /tmp is tricky...
* swish-e does not check whether writing to a file succeeded; it failed
to notice the partition was full. If it had noticed and terminated after
deleting the files, the system wouldn't have been running with /var
full (as far as non-root users were concerned) for several hours.
* swish-e does not (always) check whether reading from files succeeds; if
it had done so, it would have noticed the temporary index files were
truncated (and wouldn't have run for 2-3 hours with no sign of
stopping - see next point).
* A related point, noted previously with swish (no bug report as it was
not being maintained...), is that swish search processes do not notice
if the index file is truncated, and instead loop consuming CPU time as
fast as they can get it. I presume swish-e would behave similarly.
[The overnight index build - single index - failed due to the target
partition being filled by something else; no error report for that, the
first hint was load average 50+ due to the search processes - with
users repeating the failing searches and making matters worse, due to
lack of response. It was that incident that prompted a cron job to
page me if partitions got overfull - but the pager system let me down
* When interrupted (control-C while running interactively, or kill PID
when running in the background), swish-e does not take any action to
ensure the temporary files are deleted, and since it's not documented
that they are even created, they might be left consuming space with
I suspect fixing the code to deal with all of those points may be quite a lot
of work, unfortunately!
University of Cambridge WWW manager account (usually John Line)
Send general WWW-related enquiries to firstname.lastname@example.org
Received on Sun Nov 23 15:55:35 1997