At 11:06 AM 07/01/02 -0700, CBol wrote:
>
>> How about for now when searching you do:
>>
>> ./swish-e -w $query -f index1 index2 index3 ...
>
> It's what I'm doing. My concern is what may happen over time, when I feed
> more and more files to the search engine, one more index each month.
> OK, I may reindex all from time to time, but I will be more a more elegant
> solution if I can sum the indexes.
Incremental indexing is a problem. There's work toward incremental
indexing but it will be a while before it's available. Swish is very fast
at indexing so if you are not indexing hundreds of thousands of files then
reindexing typically isn't a huge issue.
You might want to look at htdig, as I think it does incremental indexing.
>> Merge does not work well in 2.1-dev version, and is a current topic of the
>> developers. It uses way too much memory.
>
> What then? Memory is very inexpensive today. My index is 15 MBytes in size,
> well below the memory I have available in my machine (256M).
You would have to try and see how it goes.
>> 17 hours is a long time for indexing. How many files were you indexing?
>
>Circa 10000, and I 'm not enough fluent in Perl to edit the scripts and use
> the prog feature. ;-(
Do you have a delay set in the swish-e config? I think I can index 10,000
files in less time: ;)
10000 files indexed. 19646749 total bytes. 2031037 total words.
Elapsed time: 00:00:20 CPU time: 00:00:13
Indexing done!
My advice would be see if you can fetch the files faster, or better yet,
cache them compressed locally. Of course, what's 17 hours -- just start it
and let it run.
--
Bill Moseley
mailto:moseley@hank.org
Received on Mon Jul 1 21:45:30 2002