We're indexing about 2.5 million files at the moment, and we're probably going to end up with about 6 million files eventually.
We categorise our files by certain criteria and index them into seperate index files by category. A user then selects the categories that he's interested in, and we only have to search through those index files. This cuts down on the search and merging speed.
Now we want to optimise our categorisation criteria to find the granularity that works best for swish, trading off:
- searching/merging against a few large index files vs
- searching/merging against many small index files
The more indexes you have, the more specific the user can get with his query and we can let off searching through irrelevant indexes. But if the user happens to be interested in everything, you have to search through a lot of small indexes.
Does anyone know whether swish shows a linear or an exponential speed degradation when searching/merging larger size index files vs larger number of files?
Users mailing list
Received on Wed Feb 20 00:34:28 2008