> I our company a severe problem with swish-e (1.3.2-filter) occured.
> What happend:
> Someone executed a search for "r*" - due to our large index
> (16000 documents) it takes swish "a little" time to get the
> Due to the response/search time the user executed the search
> request several times. But this was not the main problem.
> Swish-e uses a vast(!) amount of memory. In our cases 2GB.
> This caused our main server (large SUN-Server, with 0.8 TBytes)
> to be rebooted (manually), because other processes failed due
> to a lack of swapspace/memory, etc...
> Remark: I used the swish-option "-m 500"
There is even a worst search. Try "a* or b* or c* ..."
and you can get a good DOS atack.
Swish-e needs to keep all the results in memory. For this reason
it uses a vast amount of memory. The m option does not solve
the problem because this part is executed after all the results are
But there is something even worse. Swish-e-1.3.2 pre-computes
the wildcard search into a list of "or". So "r*" becomes
"r1 or r2 or r3..." (r1, r2 and r3 are words). This makes
swish-e-1.3.2 very slow with this type of search. Each "or" wastes
even more memory when computing intermediate results because the
memory no longer used is not freed!!
For this reason I completely rewrite the "wildcard" search
for the PHRASE version and added several calls to efree.
> There may be the following solutions to this problem:
> - use a seperate machine for the search engine (e.g. a cheap
> linux box)
This does not solve the problem. Memory is cheaper but the problem
> - reject short query requests in the CGI script executing the
> swish-e program.
This could be the best one.
> - restrict swish-e (via an option swish, or compile switch) to
> a maximum of internal result enries (this differs to the
> "-m" Option). I know there may be implications the the
> search results (quality, sorting, return results, etc...)
> But this could prevent a worst case scenario...)
As you say, this is hard to implement.
Received on Mon Jun 19 10:04:05 2000