Thanks for the advice. I have a pool bearly 10000 different documents, which can have all the available vocabulary in the it's domain.
I thought that the number of different words will induce the performance, and not the number of docs. The number of docs will only induce the size of the index file. The question is : are the words/index of word all stored in the same area of the index file, or are they stored in all the file ? That is, when searching for a particular word, will Swish read most of the index file, or will it read just a little part ?
Is there a project of having the Swish engine/index files kept in memory (to reduce the time used by just loading the executable/index file, launching the process, etc ?) I know the project of a student in South America, that wants to integrate Swish into Apache, as a module.
When I will have a sharable bench, I will make it available to the anyone.
TÚl : +33 1 43 21 16 66
Fax : +33 1 56 54 02 18
De: Ron Samuel Klatchko [SMTP:email@example.com]
Date: jeudi 30 septembre 1999 20:47
└: Multiple recipients of list
Objet: [SWISH-E] Re: Benchmarking
Nicolas Huillard wrote:
> Is there somewhere some benchmarks of Swish-e :
> * query time vs. number of documents
> * indexing time vs. number of documents
> * query time vs. number of simultaneous queries
> * etc.
I don't know of any benchmarks, but if you're designing your own, pay
attention not only to the number of documents, but to the number of
unique words indexed. From my understanding of how the underlying
algorithm works, the number of documents is practically irrelevant at
Ron Samuel Klatchko - Software Jester
Brightmail Inc - firstname.lastname@example.org
Received on Fri Oct 1 02:36:39 1999