Skip to main content.
home | support | download

Back to List Archive

Re: question about merging indexes when there is file overlap

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Apr 13 2004 - 18:52:25 GMT
On Tue, Apr 13, 2004 at 11:25:14AM -0700, Thomas M. Parris wrote:
> Dear SWISHers:
> 
> I have a quick question about the merge feature of swish.
> 
> 1. Assume I have a directory with five files in it (1.html .... 5.html), and
> that I index these files with swish (index1.index).
> 
> 2. Now assume I modify 5.html and and 6.html and index these files
> (index2.index). (In my case, 5.html is modified to point to 6.html).
> 
> 3. Will I run into problems if I merge index1.index and index2.index?  (this
> process will repeat hundreds of times).

Not exactly sure.  I think if you merge two indexes with the same files it will end
up replacing all files -- When merging if a file comes along with the same or newer date
then it will replace the existing file.  Maybe it should only replace if it's newer.

moseley@bumby:~/merge$ swish-e -M index1 index1 same
Input index 'index1' has 5 files and 1 words
Input index 'index1' has 5 files and 1 words
Replaced file '1.html 2004-04-13 11:40:10 PDT' with '1.html 2004-04-13 11:40:10 PDT'
Replaced file '2.html 2004-04-13 11:40:10 PDT' with '2.html 2004-04-13 11:40:10 PDT'
Replaced file '3.html 2004-04-13 11:40:10 PDT' with '3.html 2004-04-13 11:40:10 PDT'
Replaced file '4.html 2004-04-13 11:40:10 PDT' with '4.html 2004-04-13 11:40:10 PDT'
Replaced file '5.html 2004-04-13 11:40:10 PDT' with '5.html 2004-04-13 11:40:10 PDT'

But if the index only has the modified files in it:

moseley@bumby:~/merge$ for num in 1 2 3 4 5; do echo "hi" > $num.html; done
moseley@bumby:~/merge$ swish-e -f index1 -i *.html -v0

moseley@bumby:~/merge$ for num in 4 5; do echo "foo" > $num.html; done
moseley@bumby:~/merge$ swish-e -f index2 -i 4.html 5.html -v0

moseley@bumby:~/merge$ swish-e -M index1 index2 indexout
Input index 'index1' has 5 files and 1 words
Input index 'index2' has 2 files and 1 words
Replaced file '4.html 2004-04-13 11:40:10 PDT' with '4.html 2004-04-13 11:40:38 PDT'
Replaced file '5.html 2004-04-13 11:40:10 PDT' with '5.html 2004-04-13 11:40:38 PDT'
Getting words in index 'index1':      1 words
Getting words in index 'index2':      1 words
Processing words in index 'indexout':      2 words
Removed      0 words no longer present in docs for index 'indexout'
Writing main index...
Sorting words ...
Sorting 2 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
2 unique words indexed.
4 properties sorted.                                              
5 files indexed.  0 total bytes.  5 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!

moseley@bumby:~/merge$ swish-e -f indexout -T index_words_full

-----> WORD INFO in index indexout <-----

foo
 Meta:1 4.html Freq:1 Pos/Struct:2/9
 Meta:1 5.html Freq:1 Pos/Struct:2/9

hi
 Meta:1 1.html Freq:1 Pos/Struct:2/9
 Meta:1 2.html Freq:1 Pos/Struct:2/9
 Meta:1 3.html Freq:1 Pos/Struct:2/9


> 
> Many thanks in advance for your help.  I love swish.  I've succesfully used
> it for a number of varied applications.
> 
> Cheers,
> Tom
> -------------------------------------------------------
> Thomas M. Parris
> Research Scientist and Executive Director Boston Office
> ISciences, LLC
> 685 Centre Street, Suite 207
> Jamaica Plain, MA  02130  USA
> 
> Tel:   +617-524-8041        http://www.isciences.com/
> Fax:   +617-344-2580        http://www.terraviva.net/
> Email: parris@isciences.com
> ------------------------------------------------------
> 
> 

-- 
Bill Moseley
moseley@hank.org
Received on Tue Apr 13 11:52:26 2004