Hi all,
indexing large documents (more than 11000 files, 1.2 Go and near 4.5 M words), i
have noticed that the indexing times can be heavily reduced when i increased the
three "#define" HASHSIZE, BIGHASHSIZE, SEARCHHASHSIZE.
I don't know which of the three is the most significant, but indexing time drop
from 6 hours to less than 2 hours, and these 2 hours are mostly CPU bound.
May be, these parameters can dynamics (configuration parameters) or have larger
default.
After this, most of the times (80-90 %) is spent in the phase "writing word
data" doing a lot of CPU and millions reads in the temporary file build during
the parsing-collecting pass.
I haven't isolate which routines is costly.
Jean-François
Received on Wed Dec 26 10:54:39 2001