Hi Paul, David,
On 22 Aug 2000, at 12:46, Paul Thomas wrote:
> On Tue, 22 Aug 2000, David Norris wrote:
> > You might want to merge the update indices with the old indices after
> > you have indexed. (swish-e -M index1 updates1 -f index1_new) That
> > would allow you to incrementally update a master index. The ranking
> > isn't great this way, but, it's better than nothing.
> Is it fasterr and less demanding of the cpu to merge a new index with a
> master than it is to just index all the files at once? How would this
> adversly affect ranking?
No, the demanding of the CPU and memory is just the same:
Scenary 1 (no merge)
- The index proccess parses all the files, loads all the data into
memory and write the one index file.
Scenary 2 (no merge)
- Build the first file (less CPU and memory)
- Build the second file (less CPU and memory)
[ At this point the only advantage is that you have used less
memory because the CPU usage of both process is almost
identical to scenary 1) ]
- Build the merged file. Once again you are consuming CPU
and the memory usage is almost identical to Scenary 1 because
all the data of the two index files are loaded in memory prior to write
the merged index file.
IMO, it is always better to build the index file using Scenary 1.
Swiwsh-e-2.0 outperforms 1.3.2 in the index proccess.
There are also some info that is lost in the merged file:
- Header info
- Dynamic stopwords (based on IgnoreLimit option) have the same
problem and rank.
- The rank is not accurate
- If the config files have different options (eg: Stemming) the results
can not be as you expected. eg: If only one of the files have
Stemming enabled the Merged file it also assumed as stemmed
but you have data in it that was not stemmed!!
Received on Wed Aug 23 07:46:21 2000