#uname -a
Linux grape.solid.net 2.6.9-1.667 #1 Tue Nov 2 14:41:25 EST 2004 i686
athlon i386 GNU/Linux
(this is fedora core3 with some rpm updates)
swish version 2.4.3
I let ./configure pick all the defaults
everything seems to work (ie. swish.cgi finds the search terms and creates
links to the files correctly) except StoreDescription is not storing the
description so swich.cgi can not display the body text
this is my configuration file
IgnoreWords file: /home/swish/stuff/stopwords.txt
MetaNames swishtitle
MetaNames swishdocpath
StoreDescription HTML* <body> 20000
PropCompressionLevel 9
different issue?? (or a non-issue because everything is too short to
compress?) changing PropCompressionLevel from 0 to 9 does not change the
length of the created files. zlib is on the machine and configure found it
metanames seems to know it is supposed to save the descriptions....
#/home/swish/swish-e-2.4.3/src/swish-e -T index_metanames
-----> METANAMES for index.swish-e <-----
swishdefault : id= 1 type= 1 META_INDEX Rank Bias= 0
swishreccount : id= 2 type=42 META_INTERNAL META_PROP:NUMBER
swishrank : id= 3 type=42 META_INTERNAL META_PROP:NUMBER
swishfilenum : id= 4 type=42 META_INTERNAL META_PROP:NUMBER
swishdbfile : id= 5 type=38 META_INTERNAL
META_PROP:STRING(case:compare) SortKeyLen: 100
swishdocpath : id= 6 type= 6 META_PROP:STRING(case:compare)
SortKeyLen: 100 *presorted*
swishtitle : id= 7 type=70 META_PROP:STRING(case:ignore)
SortKeyLen: 100 *presorted*
swishdocsize : id= 8 type=10 META_PROP:NUMBER *presorted*
swishlastmodified : id= 9 type=18 META_PROP:DATE *presorted*
swishtitle : id=10 type= 1 META_INDEX Rank Bias= 0
swishdocpath : id=11 type= 1 META_INDEX Rank Bias= 0
swishdescription : id=12 type= 6 META_PROP:STRING(case:compare)
SortKeyLen: 100 *presorted*
I do the index with the following
#!/bin/bash
/home/swish/swish-e-2.4.3/prog-bin/HerTree.pl \
/mirror \
| /home/swish/swish-e-2.4.3/src/swish-e \
-c swish.conf \
-v9 -S prog -i stdin
HerTree.pl produces the following for each file (its always type HTML, it
prefilters everything except html)
Path-Name: file_1.html
Content-Length: 7948
Last-Mtime: 1108515537
Document-Type: HTML
<blank line>
<file body>
swish-e produces lots of lines like
file_1.html - Using HTML parser - (100 words)
file_2.html - Using HTML parser - (98 words)
and ends with
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 56,664 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
56,664 unique words indexed.
5 properties sorted.
1,302 files indexed. 19,636,528 total bytes. 1,200,429 total words.
Elapsed time: 00:00:38 CPU time: 00:00:06
Indexing done!
If I do a command line search such as
/home/swish/swish-e-2.4.3/src/swish-e -x
'<swishrank>:<swishdescription>:<swishtitle>\n' -w "frog"
# SWISH format: 2.4.3
# Search words: frog
# Removed stopwords:
# Number of hits: 3
# Search time: 0.003 seconds
# Run time: 0.021 seconds
1000::Title 1
526::Title 2
526::Title 3
Now the real question: where is the <body> text??
Received on Mon Mar 7 23:24:50 2005