Re: diff'ing indexes

From: Bill Moseley <moseley(at)>
Date: Thu Oct 14 2004 - 17:11:58 GMT
Am I reproducing correctly?

moseley@bumby:/tmp/peter$ cat file?.html
<meta name='metaA' content='foo'>
<meta name='metaB' content='bar'>

some1 content1

<meta name='metaA' content='foo'>
<meta name='metaB' content='bar'>

some2 content2

moseley@bumby:/tmp/peter$ cat c
Metanames metaA metaB
PropertyNames metaA metaB
IgnoreTotalWordCountWhenRanking 0

moseley@bumby:/tmp/peter$ cat c2
Metanames metaB metaA
PropertyNames metaB metaA
IgnoreTotalWordCountWhenRanking 0

moseley@bumby:/tmp/peter$ rm out.index

moseley@bumby:/tmp/peter$ swish-e -c c -i file?.html -v0

moseley@bumby:/tmp/peter$ swish-e -c c2 -i file1.html -f fileone.index -v0

moseley@bumby:/tmp/peter$ swish-e -M index.swish-e fileone.index out.index
Input index 'index.swish-e' has 2 files and 8 words
Input index 'fileone.index' has 1 files and 5 words
Replaced file 'file1.html 2004-10-14 09:55:57 PDT' with 'file1.html 2004-10-14 09:55:57 PDT'
Getting words in index 'index.swish-e':      8 words
Getting words in index 'fileone.index':      5 words
Processing words in index 'out.index':      8 words
Removed      0 words no longer present in docs for index 'out.index'
Writing main index...
Sorting words ...
Sorting 8 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
8 unique words indexed.
6 properties sorted.                                              
2 files indexed.  0 total bytes.  10 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!

moseley@bumby:/tmp/peter$ swish-e -w metaa=foo -f out.index
# SWISH format: 2.5.1
# Search words: metaa=foo
# Removed stopwords: 
# Number of hits: 2
# Search time: 0.005 seconds
# Run time: 0.025 seconds
1000 file2.html "file2" 151
1000 file1.html "file1" 151

Bill Moseley

Received on Thu Oct 14 10:12:24 2004