Here's a bit more information -- it appears that the logfiles for the "good"
indexings and the logfiles for the "bad" indexings are different in one key
The number of files they index is the same: 21,000 files. But on the bad
ones, the indexer is finding 26041 unique words, and a total of 535,411
total words. On the good ones, the indexer is finding 108,563 unique words,
and 5,971,632 total words.
So it's seeing the files, but not indexing them completely. I've looked at
the source code, and the SwishCommand noindex and SwishCommand index tags
are in the proper spots. And we've not made any edits to our stopwords file
Any ideas which would cause the spider.pl to look at the files but not index
them in this fashion?
Received on Tue Apr 29 13:51:44 2003