Skip to main content.
home | support | download

Back to List Archive

Indexing cut off - more info

From: David VanHook <dvanhook(at)not-real.mshanken.com>
Date: Tue Apr 29 2003 - 13:47:49 GMT
Here's a bit more information -- it appears that the logfiles for the "good"
indexings and the logfiles for the "bad" indexings are different in one key
respect.

The number of files they index is the same: 21,000 files.  But on the bad
ones, the indexer is finding 26041 unique words, and a total of 535,411
total words.  On the good ones, the indexer is finding 108,563 unique words,
and 5,971,632 total words.

So it's seeing the files, but not indexing them completely.  I've looked at
the source code, and the SwishCommand noindex and SwishCommand index tags
are in the proper spots.  And we've not made any edits to our stopwords file
since January.

Any ideas which would cause the spider.pl to look at the files but not index
them in this fashion?

Dave VanHook
Received on Tue Apr 29 13:51:44 2003