This is rather odd:
Just downloaded and installed swish-e (linux). Install went without a
hitch.
I tested it by attempting to index one directory that contained a few (9)
small files. None of them were indexed (although their content was distinct
and carried varied, unique words). The output is as follows:
$ swish-e -c search.conf
Indexing Data Source: "File-System"
Indexing /path/to/test-dir..
Checking dir "/path/to/test-dir"...
In dir "/path/to/test-dir":
000002 (42 words)
000003 (42 words)
000004 (42 words)
000005 (8 words)
000006 (6 words)
000007 (26 words)
000008 (188 words)
ecommerce.html (246 words)
index.html (188 words)
Removing very common words...
343 words removed.
7 words removed not in common words array:
i, 1, 5, -1, e, 7, 14,
Writing main index...
Computing hash table ...
Writing header ...
Writing index entries ...
Writing stopwords ...
no unique words indexed.
Writing file index...
Writing file list ...
Writing file offsets ...
Writing MetaNames ...
Writing offsets (2)...
9 files indexed.
No words were indexed, as you can see, and any search returns the following:
$ swish-e -w "famous" -f /path/to/search.index
# Swish-e format 2.0
#
# Name: (no name)
# Saved as: search.index
# Counts: 20 words
# Indexed on: 28/01/2002 20:03:02 EST
# Description: (no description)
# Pointer: (no pointer)
# Maintained by: (no maintainer)
# DocumentProperties: Enabled
# Stemming Applied: 0
# Soundex Applied: 0
# WordCharacters: &'-0123456789@\_abcdefghijklmnopqrstuvwxyz
# MinWordLimit: 3
# MaxWordLimit: 15
# BeginCharacters: &'-0123456789@\_abcdefghijklmnopqrstuvwxyz
# EndCharacters: &'-0123456789@\_abcdefghijklmnopqrstuvwxyz
# IgnoreFirstChar: '(
# IgnoreLastChar: '),.;
# SWISH format 2.0
err: the index file(s) is empty
THE REALLY ODD PART:
After testing, testing, and testing (permissions, file extensions,
Min/MaxWordLimits config.h parameters) I went back to the original conf file
that produced the above results. But this time I simply added more files to
the test directory. Lo and behold, indexing occurs:
$ swish-e -c search.conf
Indexing Data Source: "File-System"
Indexing /path/to/test-dir..
Checking dir "/path/to/test-dir"...
In dir "/path/to/test-dir":
000002 (42 words)
000003 (42 words)
000004 (42 words)
000005 (8 words)
000006 (6 words)
000007 (26 words)
000008 (188 words)
foo.html (188 words)
bar.html (143 words)
baz.html (151 words)
bling.html (162 words)
blang.html (143 words)
blong.html (249 words)
ting.html (245 words)
tang.html (144 words)
tong.html (246 words)
all.html (246 words)
your.html (281 words)
base.html (188 words)
are.html (216 words)
belong.html (214 words)
to.html (217 words)
us.html (28 words)
Removing very common words...
351 words removed.
15 words removed not in common words array:
i, 1, 5, -1, e, 7, 14, ad, 4, &, 3, s, r, o, 76,
Writing main index...
Computing hash table ...
Writing header ...
Writing index entries ...
Writing stopwords ...
93 unique words indexed.
Writing file index...
Writing file list ...
Writing file offsets ...
Writing MetaNames ...
Writing offsets (2)...
23 files indexed.
Running time: Less than a second.
Indexing done!
And the original files that were not indexed the first time, seem to be
indexed this time as seen in a search for a word that is only contained in
one of the original files:
$ swish-e -w "famous" -f /path/to/search.index
# Swish-e format 2.0
#
# Name: (no name)
# Saved as: search.index
# Counts: 93 words, 23 files
# Indexed on: 28/01/2002 20:25:29 EST
# Description: (no description)
# Pointer: (no pointer)
# Maintained by: (no maintainer)
# DocumentProperties: Enabled
# Stemming Applied: 0
# Soundex Applied: 0
# WordCharacters: &'-0123456789@\_abcdefghijklmnopqrstuvwxyz
# MinWordLimit: 3
# MaxWordLimit: 15
# BeginCharacters: &'-0123456789@\_abcdefghijklmnopqrstuvwxyz
# EndCharacters: &'-0123456789@\_abcdefghijklmnopqrstuvwxyz
# IgnoreFirstChar: '(
# IgnoreLastChar: '),.;
# SWISH format 2.0
# Search words: famous
# Number of hits: 1
1000 /path/to/test_dir/000005 "000005" 228
Any ideas? Is there some sort of "minimum file" flag? Thanks.
--
Advansis: http://www.advansis.com/
Received on Tue Jan 29 03:31:13 2002