I am merging 600-700 (not very large) indexes into one large index
and when I search in the new index, using :
./swish-e -w adas -f /some/path/bigindex
I get results like this:
# Swish-e format 2.0
#
# Name: Polopoly Index
# Saved as: polopolyindex
# Counts: -5988 words, 14 files
# Indexed on: 01/08/2001 17:08:33 MEST
# Description: (no description)
# Pointer: (no pointer)
# Maintained by: (no maintainer)
# DocumentProperties: Enabled
# Stemming Applied: 0
# Soundex Applied: 0
# WordCharacters: #&+0123456789;?@\_abcdefghijklmnopqrstuvwxyzÄÅÖäåö
# MinWordLimit: 4
# MaxWordLimit: 40
# BeginCharacters: &0123456789_abcdefghijklmnopqrstuvwxyzÄÅÖäåö
# EndCharacters: +0123456789;?_abcdefghijklmnopqrstuvwxyzÄÅÖäåö
# IgnoreFirstChar: "'(
# IgnoreLastChar: "'),.;
# SWISH format 2.0
# Search words: adas
# Number of hits: 2
1000 /indexer.jsp?d=223&a=19805& "indexer.jsp?d=223&a=19805&" 904
1000 /indexer.jsp?d=223&a=19805& "indexer.jsp?d=223&a=19805&" 904
.
The returned filename(s) exists, but there are no occurencies of the
string "adas" in it. Before I re-indexed the site, I got one correct
and one incorrect filename as result. After reindex, I got this
instead.
Why the wrong filenames? Why two identical lines?
Any ideas of what is happening?
I use swish-e 2.0.5 (and I don't have time to try out 2.1 since the
site is to be released on Friday (!)).
The comand line when indexing looks like this, for example:
./swish-e -S fs -c ./config/swish.conf -f /path/to/index -i /path/to/tmp/html/dir/indexer.jsp?d=293&a=23667&
My swish.conf has the follwing set parameters:
IndexFile polopolyindex
IndexName "Polopoly Index"
MetaNames foo
IndexReport 3
FollowSymLinks yes
IgnoreTotalWordCountWhenRanking yes
ReplaceRules remove "/path/to/tmp/html/dir"
MinWordLimit 4
WordCharacters abcdefghijklmnopqrstuvwxyz1234567890_ÅÄÖåäö\&#@+?;
BeginCharacters abcdefghijklmnopqrstuvwxyz1234567890åäöÅÄÖ_&
EndCharacters abcdefghijklmnopqrstuvwxyz1234568790åäöÅÄÖ?+_;
IgnoreWords som att dem dom och men det ska
IndexComments 0
(Strange characters and word since Swedish site...)
/Stefan Bergstrand
Received on Wed Aug 1 16:39:48 2001