Skip to main content.
home | support | download

Back to List Archive

Re: Geting "status: 500" while indexing some pages

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Feb 02 2005 - 21:18:52 GMT
On Wed, Feb 02, 2005 at 04:04:36PM -0500, Juan Carlos Avila / MTBASE wrote:
> Hi Bill,
> 
> Yes, I'm monitoring my web server's log and while swish-e shows status 
> 500, my web server shows 200.
> 
> I do not understand what you mean by "running the spider with:  
> SPIDER_DEBUG=...." -- I'm quite new to swish-e... sorry.

Sorry, I'm used to setting environment vars at the command line.
I spent last weekend working on windows XP and aged about five years.

Create config.txt:

@servers = ( {
    base_url => 'http://your_server/casos/VerCasoIdx?caso_numero=6896',
    debug    => 'headers, url, skipped',
    max_files => 1,
    email     => 'you@yourmail.whatever',
} );

Then run the spider directly -- I don't know where it's installed on
your machine, but this is what I would do:

    perl /usr/local/lib/swish-e/spider.pl config.txt > output

which generates this:



/usr/local/lib/swish-e/spider.pl: Reading parameters from 'config.txt'

 -- Starting to spider: http://localhost/index.html --
Request for 'http://localhost/index.html' aborted because: 'dead at /usr/local/lib/swish-e/spider.pl line 688.'

Summary for: http://localhost/index.html
Connection: Close: 1  (1.0/sec)
          Skipped: 1  (1.0/sec)
      Unique URLs: 1  (1.0/sec)
moseley@bumby:~$ perl /usr/local/lib/swish-e/spider.pl config.txt > output
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'config.txt'

 -- Starting to spider: http://localhost/index.html --

vvvvvvvvvvvvvvvv HEADERS for http://localhost/index.html vvvvvvvvvvvvvvvvvvvvv

---- Request ------
GET http://localhost/index.html
Accept-Encoding: gzip; deflate
From: me@inalid.com
User-Agent: swish-e spider 2.2 http://swish-e.org/


---- Response ---
Status: 200 OK
Connection: close
Date: Wed, 02 Feb 2005 21:16:49 GMT
Accept-Ranges: bytes
ETag: "1c0140c-100f-3ffc3496"
Server: Apache/1.3.33 (Debian GNU/Linux) PHP/4.3.9-2 mod_ssl/2.8.22 OpenSSL/0.9.7d mod_perl/1.29
Content-Length: 4111
Content-Type: text/html; charset=iso-8859-1
Content-Type: text/html; charset=iso-8859-1
Last-Modified: Wed, 07 Jan 2004 16:32:22 GMT
Client-Date: Wed, 02 Feb 2005 21:16:49 GMT
Client-Peer: 127.0.0.1:80
Client-Response-Num: 1
Title: Welcome to Your New Home Page!
X-Meta-Author: johnie@debian.org (Johnie Ingram)
X-Meta-Description: The initial installation of Debian/GNU Apache.
X-Meta-GENERATOR: Mozilla/4.05 [en] (X11; I; Linux 2.3.99-pre3 i686) [Netscape]

^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^

>> +Fetched 0 Cnt: 1 GET  http://localhost/index.html  200 OK text/html 4111 parent: depth:0
sleeping 5 seconds
/usr/local/lib/swish-e/spider.pl: Max files Reached

Summary for: http://localhost/index.html
Connection: Close:     2  (0.4/sec)
   Off-site links:    10  (2.0/sec)
      Total Bytes: 4,111  (822.2/sec)
       Total Docs:     1  (0.2/sec)
      Unique URLs:     2  (0.4/sec)


So that shows you exactly what the server is sending back.  If that
says 500 and your logs say 200 then maybe:

1) you are looking at the wrong longs
2) you web server is telling you a lie
3) spider.pl or LWP::UserAgent/LWP::RobotUA is generating a 500
but I can't think of why it would do that....



-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Wed Feb 2 13:18:53 2005