Hi,
On 19 Feb 2001, at 8:02, Bill Moseley wrote:
> And after about 15 minutes I killed it. Jose, would it be possible to
> make all the adjustments in one pass or must they be made one word at
> a time?
>
I rewrote this routine (removestops) in the first days of 2.0 just to
complain with phrase search. The problem is really hard. When you
remove a word from the list of words you have also to adjust the
position counter of all the rest of words, in each occurence, when the
automatic stopword was before it. Perhaps there can be a faster
approach...
Eg:
With just one file with the following test:
This is a word in a phrase in this file
More or less the info is like this:
this: file 1 positions 1,9
is: file 1 position 2
in: file 1 positions 5,8
After removing "a"
this: file 1 positions 1,7
is: file 1 position 2
in: file 1 positions 4,6
An "automatic" stopword like "a" is in almost all the files several
times. So adjusting positions is a heavy CPU/RAM proccess. Also,
some of the info is compressed to save RAM and need to be
decompressed/compressed in the fly to recompute the positions.
cu
Jose
Received on Mon Feb 19 16:57:35 2001