I'm just starting to implement swish-e 2.4.2 as a web search engine and are
having a few problems... Situation:
-Installation on a windows 2000 box went OK
-configured for crawling an external website, crawling seems OK
-installed the swish.pl script file for searching in the IIS script
Now, running the crawling/indexing gives this output:
Summary for: http://www.mysite/index.htm
Connection: Close: 1 (0.0/sec)
Connection: Keep-Alive: 465 (3.4/sec)
Duplicates: 9,545 (69.2/sec)
Off-site links: 2,777 (20.1/sec)
Skipped: 1 (0.0/sec)
Total Bytes: 29,367,422 (212807.4/sec)
Total Docs: 464 (3.4/sec)
Unique URLs: 467 (3.4/sec)
Skipping Server Config: http://swish-e.org/current/docs/
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 7,941 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
7,941 unique words indexed.
4 properties sorted.
151 files indexed. 3,323,845 total bytes. 72,344 total words.
Elapsed time: 00:02:20 CPU time: 00:02:19
problem 1): Why is "unique URLs"=467, while "files indexed"=151? Are some
(most!) documents skipped? It seems so, because searching on specific words
known to exist in a specific document gives no results. (other word searches
seem OK, though). Or could it be something else causing this search failure?
I was thinking of date filtering but this isn't implemented as far as I can
problem 2) is cosmetic: When I click a search result I want to open the link
in a new browser windows (that is, linking with an href tag with
target="_blank"). I've been looking around the swish.pl file but it's far
from obvious where to modify the result link...
Any help appreciated!
Received on Wed Oct 6 16:23:00 2004