Re: Re: indexing javawebserver-hosted sites

From: Ron Samuel Klatchko <rsk(at)>
Date: Thu Sep 23 1999 - 00:37:21 GMT

> >That's odd.  I just tried running swishspider manually on that site and
> >saw that it had no problem extracting the links.  What version of SWISH
> >are you running?
> 1.3

1.3 exactly or 1.3.x?

> Well, this is bizarre:
> root@lsminfo:/usr/local/etc# perl ../bin/swishspider ./test
> root@lsminfo:/usr/local/etc# ls -la test.*
> -rw-r--r--   1 root     root         9593 Sep 22 19:45 test.contents
> -rw-r--r--   1 root     root           45 Sep 22 19:45 test.response

Okay, that makes a little more sense (at least is explains why swish
doesn't have any further links to crawl).  There is a known bug in the
distributed version of swish where files that have charsets in their
mime types are not properly spidered.  You could try applying the
following patches:

I'm not sure if that would fix it because when I get the URL you provide
I see a mime type of "text/html", but the size of your response file
differs from mine so perhaps you server is doing some conditional

