Hi,
Can anyone tell me how to configure spider.pl so that it doesn't follow broken links? I'm picking out what I believed to be
the HTTP::Response 'code' method in my test_response subroutine, but it doesn't seem to work. Here is my config code:
my %server = (
base_url => 'http://www.lib.sfu.ca/',
email => 'mjordan@sfu.ca',
test_url => sub { $_[0]->path !~ /(\.gif|\.jpg|\.jpeg)/i },
test_response => sub { $_[2]->code !~ /404/ }, # <- This doesn't seem to work
credential_timeout => '0',
keep_alive => 1
);
Am I on the right track, is there a better way to make spider.pl not follow broken links, or at least not index 404 not found
response pages?
Thanks,
Mark
Mark Jordan
Acting Coordinator of Library Systems
W.A.C. Bennett Library, Simon Fraser University
Burnaby, British Columbia, V5A 1S6, Canada
Phone (604) 291 5753 / Fax (604) 291 3023
mjordan(at)not-real.sfu.ca / http://www.sfu.ca/~mjordan/
Received on Thu May 6 11:20:08 2004