Skip to main content.
home | support | download

Back to List Archive

Re: Searching with a great number of index files

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sat Dec 14 2002 - 14:55:30 GMT
On Sat, 14 Dec 2002, Yann wrote:

>   i wanted some informations about performance while searching in a large
> number of index files, such as 50.. i wanted to do so in order to allow
> the user to search only among rfcs, or only among man's, or among mysql
> doc... (i'm working on a large documentation site).
> 
>  Do you think that it is a good way to create so much index files to
> achieve that, having in mind that most often the search will request all
> the indexes? Is there any other way to do what I want?

It all depends.  You need to test and see if it's acceptable.  If you can
write a small Perl program to generate documents (I have used one that
made documents from random words from /usr/share/dict) then it's quite
easy to generate some test indexes.

When I tried 100 indexes it was a little slow.  But the opening of the
indexes is the slow part so you could overcome that issue using the
Swish-e library to keep the indexes open.  You will want plenty of RAM,
too.  But, again, that's something you need to test.

If you are not indexing a huge number of total documents then I'd also try
creating one index and then using a metaname to limit to the various
sections.  If some of your files require filtering before indexing
consider creating a compressed cache of the filtered documents that can be
incrementally updated and then have swish-e index that to speed up
indexing time.

Whatever you do, please report back your findings.


-- 
Bill Moseley moseley@hank.org
Received on Sat Dec 14 14:55:42 2002