Skip to main content.
home | support | download

Back to List Archive

[swish-e] searching anomoly on small word.

From: brad miele <bmiele(at)not-real.ipnstock.com>
Date: Fri Apr 11 2008 - 19:12:21 GMT
Hi,

It was reported to me that one of our clients couldn't search for the word 
diy in their index today. I noticed that it the index had been built on a 
new server and started checking versions and whatnot.

server1: SWISH-E 2.4.4
server2: SWISH-E 2.4.4

both machines have identical config files (i forklifted the directories) 
for testing, i copied one xml test file and indexed it on each machine.

indexing results for server1:

bwayipn02# ./bin/test.sh
Indexing Data Source: "File-System"
Indexing "/usr/local/indexing/test"

Checking dir "/usr/local/indexing/test"...

Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 205 words alphabetically
Writing header ...
Writing index entries ...
   Writing word text: Complete
   Writing word hash: Complete
   Writing word data: Complete
205 unique words indexed.
19 properties sorted.
1 file indexed.  15,657 total bytes.  594 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
bwayipn02#

results for server2:

bwayipn04# ./bin/test.sh
Indexing Data Source: "File-System"
Indexing "/usr/local/indexing/test"

Checking dir "/usr/local/indexing/test"...

Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 205 words alphabetically
Writing header ...
Writing index entries ...
   Writing word text: Complete
   Writing word hash: Complete
   Writing word data: Complete
205 unique words indexed.
19 properties sorted.
1 file indexed.  15,657 total bytes.  594 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!

server1:
bwayipn02# swish-e -T INDEX_WORDS -f test.index | grep diy
diy [1 1 1 (819/1)] [18 1 1 (328/1)] [29 1 1 (819/1)]

server2:
bwayipn04# swish-e -T INDEX_WORDS -f test.index | grep diy
diy [1 1 1 (819/1)] [18 1 1 (328/1)] [29 1 1 (819/1)]

server1:
bwayipn02# swish-e -w diy -f test.index
# SWISH format: 2.4.4
# Search words: diy
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.010 seconds
1000 /usr/local/indexing/test/10000200178.xml "10000200178.xml" 15657
.

server2:
bwayipn04# swish-e -w diy -f test.index
# SWISH format: 2.4.4
# Search words: diy
# Removed stopwords:
err: no results
.

if i copy the index from server1 -> server2, the word can be found. 
likewise, if i copy an index from server2 -> server1, the search fails.

test.conf contents:
IncludeConfigFile /usr/local/indexing/conf/test.config

FuzzyIndexingMode Stemming_en2
IndexFile /usr/local/indexing/test.index
IndexDir /usr/local/indexing/test
ParserWarnLevel 3

test.config contents:
IncludeConfigFile /usr/local/indexing/conf/properties.inc

WordCharacters abcdefghijklmnopqrstuvwxyz0123456789.-_
IgnoreFirstChar .-
IgnoreLastChar  .-
BeginCharacters abcdefghijklmnopqrstuvwxyz0123456789
EndCharacters   abcdefghijklmnopqrstuvwxyz0123456789
IndexReport 2
TmpDir /usr/tmp

IgnoreTotalWordCountWhenRanking no
FuzzyIndexingMode Stemming_en2

IndexComments 0
BumpPositionCounterCharacters |.
DefaultContents XML*

IgnoreWords File: /usr/local/indexing/stopwords.txt
MetaNameAlias swishdefault searchable
MetaNames solokeys sphotogs rmrftype photographer sort_date 
ipn_ignore_keys siteowner date_shot released crop profile keywords shor$
UndefinedMetaTags index

PropertyNamesDate sort_date
PropertyNamesNumeric weight
PropertyNames id photographer qphotographer subject released orig_id 
date_shot image_restrictions siteowner short_caption altkeys f$
PreSortedIndex id weight adweight sportsweight newsweight travelweight 
celebrityweight scienceweight orig_id date_shot sort_date pr$

MetaNamesRank 10 subject
MetaNamesRank 10 ipn_keys
MetaNamesRank 9 short_caption
MetaNamesRank 8 short_keys
MetaNamesRank 5 keywords


anyway. i await guidance from the mothership...


Brad
---------------------
Brad Miele
VP Technology
IPNstock.com
866 476 7862 x902
bmiele@ipnstock.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Apr 11 15:12:19 2008