Hi, all. During testing of Swish-E (CVS version, checked out
2001-11-25) I encountered this oddity:
$ echo DefaultContents TXT > test.config
$ /home/argggh/src/ping/swish-e/src/swish-e -S fs -i 2.4.15-pre6/CREDITS -v 5 -c test.config
Indexing Data Source: "File-System"
Indexing "2.4.15-pre6/CREDITS"
Checking file "2.4.15-pre6/CREDITS"...
CREDITS - Using TXT parser - (11689 words)
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 5199 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
5199 unique words indexed.
4 properties sorted.
1 file indexed. 77693 total bytes.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
$ /home/argggh/src/ping/swish-e/src/swish-search -w 'Henderson'
# SWISH format: 2.1-dev-24
# Search words: Henderson
err: no results
.
$ /home/argggh/src/ping/swish-e/src/swish-search n-w 'Henderson*'
# SWISH format: 2.1-dev-24
# Search words: Henderson*
# Number of hits: 1
# Search time: 0.001 seconds
# Run time: 0.006 seconds
1000 2.4.15-pre6/CREDITS "CREDITS" 77693
.
$ /home/argggh/src/ping/swish-e/src/swish-search -w 'HendersonE'
# SWISH format: 2.1-dev-24
# Search words: HendersonE
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.007 seconds
1000 2.4.15-pre6/CREDITS "CREDITS" 77693
.
$ grep -A2 Henderson 2.4.15-pre6/CREDITS
N: Richard Henderson
E: rth@twiddle.net
E: rth@cygnus.com
$
The file indexed is CREDITS from the Linux 2.4.15pre6 source code
distribution. I suppose any Linux version in the 2.4 series will be
similar enough to exhibit this as well. I'll mail the exact file used
here to anyone who wants to test this if so is not the case.
I also have a small wish for the Swish-E developers: I'd love to be
able to feed swish-e the file contents to index on stdin. Just like
"-S prog" really, just driven by the program gathering the file
contents, not by swish-e. As of now I am kludging it by starting
swish-e like this from my gatherer:
swish-e -S prog -i /bin/cat [..]
and then feeding stdin of this process with the stuff to be indexed.
This works, but it's a bit gross.
Arne.
Received on Sun Nov 25 13:14:12 2001