Skip to main content.
home | support | download

Back to List Archive

SV: Strange Conversion error

From: Harald Heggelund <harald(at)not-real.norddata.no>
Date: Wed Oct 06 2004 - 23:22:45 GMT
Hello all,

I'm just starting to implement swish-e 2.4.2 as a web search engine and are
having a few problems... Situation:

-Installation on a windows 2000 box went OK
-configured for crawling an external website, crawling seems OK
-installed the swish.pl script file for searching in the IIS script
directory

Now, running the crawling/indexing gives this output:

  Summary for: http://www.mysite/index.htm
       Connection: Close:          1  (0.0/sec)
  Connection: Keep-Alive:        465  (3.4/sec)
              Duplicates:      9,545  (69.2/sec)
          Off-site links:      2,777  (20.1/sec)
                 Skipped:          1  (0.0/sec)
             Total Bytes: 29,367,422  (212807.4/sec)
              Total Docs:        464  (3.4/sec)
             Unique URLs:        467  (3.4/sec)
  Skipping Server Config: http://swish-e.org/current/docs/
  Removing very common words...
  no words removed.
  Writing main index...
  Sorting words ...
  Sorting 7,941 words alphabetically
  Writing header ...
  Writing index entries ...
    Writing word text: Complete
    Writing word hash: Complete
    Writing word data: Complete
  7,941 unique words indexed.
  4 properties sorted.
  151 files indexed.  3,323,845 total bytes.  72,344 total words.
  Elapsed time: 00:02:20 CPU time: 00:02:19
  Indexing done!

problem 1): Why is "unique URLs"=467, while "files indexed"=151? Are some
(most!) documents skipped? It seems so, because searching on specific words
known to exist in a specific document gives no results. (other word searches
seem OK, though). Or could it be something else causing this search failure?
I was thinking of date filtering but this isn't implemented as far as I can
tell.

problem 2) is cosmetic: When I click a search result I want to open the link
in a new browser windows (that is, linking with an href tag with
target="_blank").  I've been looking around the swish.pl file but it's far
from obvious where to modify the result link...

Any help appreciated!
Received on Wed Oct 6 16:23:00 2004