Skip to main content.
home | support | download

Back to List Archive

Swish-e wandering off on it's own

From: Darryl Friesen <Darryl.Friesen(at)not-real.usask.ca>
Date: Fri Feb 22 2002 - 18:33:05 GMT
I've run across an interesting problem.  I'm using the spider.pl (with "-S
prog" of course) to index out Intranet, which seems to work fine except
swish-e happily wanders off and indexes our main library web pages as well.
Our Intranet runs on the SSL port of the same machine (i.e. Intranet URLS
are all https://library.usask.ca/ and our public pages are
http://library.usask.ca).

Is there a quick and dirty way to stop this?  I have a common set of
callback functions for test_url and filter_content that I use for both the
Intranet and our main server (and a few others) so I can't just "return 0"
if the URL does not start with "https".

I thought spider.pl would treat the URLs as being different actually, but it
looks as if it's comparing host, not scheme/port (although I haven't really
looked at the code; maybe I should).

I'd appreciate any help or suggestions.

- Darryl

 ----------------------------------------------------------------------
  Darryl Friesen, B.Sc., Programmer/Analyst    Darryl.Friesen@usask.ca
  Education & Research Technology Services,     http://gollum.usask.ca/
  Information Technology Services Division,
  University of Saskatchewan
 ----------------------------------------------------------------------
  "Go not to the Elves for counsel, for they will say both no and yes"
Received on Fri Feb 22 18:34:00 2002