I have a question on multiple swish-e processes accessing the same index files.
We're running swish-e against a fileset of about 2 million XML articles. An indexing process checks for newly arrived articles about once a minute, indexing them into a seperate file and then merging that into the master. (The indexer first merges to a temporary master, and then moves the temp file over the existing master). A fixed set of Reader processes are started up, a search Scheduler forwards user search requests to these Readers and they search through the master files. Requests build up in the Scheduler if the Reader processes are all busy.
I've assumed that I should be able to scale up concurrent searches by starting up more Readers. A search takes about 1/2 seconds, so 20 concurrent user requests should be served in 30 seconds with one Reader, 15 seconds with two Readers and 1/2 seconds with 20 Readers.
Indeed everything scales nicely but only up to a point; after four Readers the swish searches start to take longer. The problem worsens the more Readers I start up.
The Indexer and Readers are written in TCL, invoking swish-e on the commandline. It seems as if the swish processes are locking each other out of the master files, but as they're all only doing read access that shouldn't be the case, should it? It could also be the Indexer process, but then it only locks the master index for the very short and intermittent time of the file move, I wouldn't expect our very consistent deterioration profile to be the result of that.
Does anyone have an idea of what could be causing the deteriorating performance?
Users mailing list
Received on Thu Apr 30 02:00:17 2009