Skip to main content.
home | support | download

Back to List Archive

Re: Re: indexing javawebserver-hosted sites

From: Ron Samuel Klatchko <rsk(at)not-real.brightmail.com>
Date: Thu Sep 23 1999 - 00:37:21 GMT
Michael-

Please respond to this list instead of to me directly.

> >That's odd.  I just tried running swishspider manually on that site and
> >saw that it had no problem extracting the links.  What version of SWISH
> >are you running?
> 
> 1.3

1.3 exactly or 1.3.x?

> Well, this is bizarre:
> 
> root@lsminfo:/usr/local/etc# perl ../bin/swishspider ./test
> http://nbl.rutgers.edu/
> root@lsminfo:/usr/local/etc# ls -la test.*
> -rw-r--r--   1 root     root         9593 Sep 22 19:45 test.contents
> -rw-r--r--   1 root     root           45 Sep 22 19:45 test.response

Okay, that makes a little more sense (at least is explains why swish
doesn't have any further links to crawl).  There is a known bug in the
distributed version of swish where files that have charsets in their
mime types are not properly spidered.  You could try applying the
following patches:

http://sunsite.berkeley.edu/SWISH-E/Patches/spider
http://sunsite.berkeley.edu/SWISH-E/Patches/spider2

I'm not sure if that would fix it because when I get the URL you provide
I see a mime type of "text/html", but the size of your response file
differs from mine so perhaps you server is doing some conditional
serving.

moo
------------------------------------------------------------
           Ron Samuel Klatchko - Software Jester
            Brightmail Inc - rsk@brightmail.com
Received on Wed Sep 22 17:39:32 1999