Hi all:
I think I figured out what happened, but I don't know how to solve it. I
think what happens is that the spider is put to sleep when it can't connect
to the site(seems like it's asking me for a user name and password, but I
already set crident_time as undef), and I forked the spider out as a zombie
program, so when it sleeps the process is killed. Is there any way around
the spider being put to sleep? Here is a copy of the setting I have in my
config file.
my %server1 = (
base_url => 'http://xxx.xxx.xxx/',
agent => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)',
email => 'blank@blank.com',
link_tags => [qw/ a /],
debug => DEBUG_ERRORS | DEBUG_FAILED | DEBUG_SKIPPED |
DEBUG_HEADERS | DEBUG_INFO | DEBUG_URL,
delay_sec => '0',
#max_wait_time => '1',
keep_alive => 'true',
max_time => '10',
max_size => '1000000',
max_files => '100',
max_depth => '10',
use_md5 => 'true',
credentials => 'username:password',
credential_timeout => undef,
use_cookies => 'true',
use_head_requests => 'true',
test_url => \&checkURL, #checks for spider traps
test_response => sub{
my $server = $_[1];
print "Checking response...\n";
print "Was the page successfully retrieved?
".$_[2]->is_success."\n";
$server->{no_spider}++ if !$_[2]->is_success;
print "Page fetched correctly\n";
print "Checking header for $_[0]\n";
my $safeSpider = new SpiderTraps;
my $headerResult =
$safeSpider->headerCheck($_[2]->content_type, $_[2]->code,
"/var/log/linkverification/linkcommand/592.spider", $_[0]);
print "The result from header check is --> $headerResult
";
#$server->{no_spider}++ if $headerResult == 0;
},
);
I've been stuck on this for so long... If anyone can help me out of it, I
would be so grateful...
-Justin
-----Original Message-----
From: swish-e@sunsite3.berkeley.edu
[mailto:swish-e@sunsite3.berkeley.edu]On Behalf Of Bill Moseley
Sent: Friday, January 14, 2005 10:34 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: random crashing of spider.pl!?
On Fri, Jan 14, 2005 at 04:50:31PM -0800, Justin Tang wrote:
> Hi all:
>
> I'm trying to use spider.pl for some verification tool, and it seems to
be
> crashing randomly on me!!! As far as I can tell, it seem to die somewhere
> between the test_url and the test_response callback functions. Now does
> anyone know what's a response that could kill spider completely?
Thanks...
Are you on shared hosting? I had that exact problem once and it
turned out that the hosting provider had a script that killed any user
process that ran more than a few minutes.
Otherwise, what kind of crash? I think the program trap
$SIG{__DIE__}, so it should report those kind of errors. It doesn't
trap any other signals - well it catches SIGHUP as a way to cleanly
abort the spider.
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Mon Jan 17 08:20:20 2005