We have a similar problem; we have about 900,000+ documents at over
Fortunately for me the documents are grouped into directories and I
only reindex the groups that change into a "intermediary" index (I
use a Makefile to detect which directories were updated). Then I merge
all the intermediary indexes into the final index. It still takes a
while (~1 hour on a sparc V210) but it's faster than doing it all from
On average it's faster to merge, however, if everything changes then it
actually takes longer... fortunately, that does not happen very often.
Also, be careful in the number of "intermediary" indexes as Swish can
only merge a few dozen at once.
I hope this helps.
[mailto:email@example.com] On Behalf Of Patrick May
Sent: Saturday, 12 July 2008 12:26 AM
Subject: [swish-e] indexing performance expectations
How should I expect indexing to perform when indexing 900,000+ very
small documents (256 Mb)? Thus far, my observation is that it takes a
while. Could it be helpful to move to an incremental format?
135 Oak Street
New York, NY 11222
+1 (347) 232-5208
Users mailing list
Received on Sun Jul 13 18:43:07 2008