Skip to main content.
home | support | download

Back to List Archive

Failing to find a word

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Oct 06 1999 - 23:37:56 GMT
I'm using stemming and I noticed I couldn't pull up a word from the index
using a wild card.  Anyone understand why this is happening?

I've got a rather simple index: just one file with three words indexed.
Here's the file, the swish config file, a -D listing, and some search
results.  The problem is searching for the word 'database' using a wild
card as in 'data*'.

% cat a.htm
database
other
words

% cat swish.cfg
IndexDir .
IndexOnly .htm
IndexReport 1
UseStemming yes

% swish -c swish.cfg
Indexing Data Source: "File-System"
Removing very common words... no words removed.
Writing main index... 3 unique words indexed.
Writing file index... 1 file indexed.
Running time: Less than a second.
Indexing done!

% swish -D index.swish-e > x
(here's the words from 'x')

# SWISH format 1.3
# Swish-e format 1.3
# 
# Name: (no name)
# Saved as: index.swish-e
# Counts: 3 words, 1 files
# Indexed on: 06/10/99 16:15:36 Pacific Daylight Time
# Description: (no description)
# Pointer: (no pointer)
# Maintained by: (no maintainer)
# DocumentProperties: Enabled
# Stemming Applied: 1
databas: 1 38698 1 1      <- here's database, stemmed
other: 1 38698 1 1
word: 1 38698 1 1         <- here's words, stemmed

I can search for 'wo*' and 'ot*' but not 'da*', 'd*', 'data*' and so on.

./a.htm "a.htm" 32

% swish -w oth*      <--- this wild card search works
# SWISH format 1.3
# Swish-e format 1.3
#
# Name: (no name)
# Saved as: index.swish-e
# Counts: 3 words, 1 files
# Indexed on: 06/10/99 16:15:36 Pacific Daylight Time
# Description: (no description)
# Pointer: (no pointer)
# Maintained by: (no maintainer)
# DocumentProperties: Enabled
# Stemming Applied: 1
# Search words: oth*
# Number of hits: 1
1000 ./a.htm "a.htm" 32
..

Here's the problem.  Why isn't swish finding the word 'database'?

% swish -w data*
# SWISH format 1.3
# Swish-e format 1.3
#
# Name: (no name)
# Saved as: index.swish-e
# Counts: 3 words, 1 files
# Indexed on: 06/10/99 16:15:36 Pacific Daylight Time
# Description: (no description)
# Pointer: (no pointer)
# Maintained by: (no maintainer)
# DocumentProperties: Enabled
# Stemming Applied: 1
# Search words: data*
err: no results
..

% swish -w d*
# SWISH format 1.3
# Swish-e format 1.3
#
# Name: (no name)
# Saved as: index.swish-e
# Counts: 3 words, 1 files
# Indexed on: 06/10/99 16:15:36 Pacific Daylight Time
# Description: (no description)
# Pointer: (no pointer)
# Maintained by: (no maintainer)
# DocumentProperties: Enabled
# Stemming Applied: 1
# Search words: d*
err: no results
..

% swish -w database  <-- this works
# SWISH format 1.3
# Swish-e format 1.3
#
# Name: (no name)
# Saved as: index.swish-e
# Counts: 3 words, 1 files
# Indexed on: 06/10/99 16:15:36 Pacific Daylight Time
# Description: (no description)
# Pointer: (no pointer)
# Maintained by: (no maintainer)
# DocumentProperties: Enabled
# Stemming Applied: 1
# Search words: database
# Number of hits: 1
1000 ./a.htm "a.htm" 32
..



Bill Moseley
mailto:moseley@hank.org
Received on Wed Oct 6 16:47:16 1999