Skip to main content.
home | support | download

Back to List Archive

Re: Geting "status: 500" while indexing some pages

From: Juan Carlos Avila / MTBASE <javila(at)not-real.mtbase.com>
Date: Wed Feb 02 2005 - 21:33:35 GMT
Ok! I finally got it...

When I run:

perl spider.pl default http://192.9.202.1/casos/VerCasoIdx?caso_numero=6896

I get the following output -- note the message "500 Chunked must be last 
Transfer-Encoding 'chunked '" at the end. Also note that if I try to run 
the spider just by changing the number at the end of tru URL (ie. 
caso_numero=6897), it works fine!

spider.pl: Reading parameters from 'default'
 -- Starting to spider: 
http://192.9.202.1/casos/VerCasoIdx?caso_numero=6896 --
vvvvvvvvvvvvvvvv HEADERS for 
http://192.9.202.1/casos/VerCasoIdx?caso_numero=6896 vvvvvvvvvvvvvvvvvvvvv
---- Request ------
HEAD http://192.9.202.1/casos/VerCasoIdx?caso_numero=6896
Accept-Encoding: gzip; deflate
From: swish@user.failed.to.set.email.invalid
User-Agent: swish-e spider 2.2 http://swish-e.org/
---- Response ---
Status: 200 OK
Connection: Close
Server: Jaguar Server Version 4.2
Content-Length: 8192
Content-Type: text/html
Client-Date: Wed, 02 Feb 2005 21:28:50 GMT
Client-Peer: 192.9.202.1:80
Client-Response-Num: 2
^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvv HEADERS for 
http://192.9.202.1/casos/VerCasoIdx?caso_numero=6896 vvvvvvvvvvvvvvvvvvvvv
---- Request ------
GET http://192.9.202.1/casos/VerCasoIdx?caso_numero=6896
Accept-Encoding: gzip; deflate
From: swish@user.failed.to.set.email.invalid
User-Agent: swish-e spider 2.2 http://swish-e.org/
---- Response ---
Status: 500 Chunked must be last Transfer-Encoding 'chunked '
Content-Type: text/plain
Client-Date: Wed, 02 Feb 2005 21:28:50 GMT
Client-Warning: Internal response
^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^
Summary for: http://192.9.202.1/casos/VerCasoIdx?caso_numero=6896
Connection: Close: 1  (1.0/sec)
      Unique URLs: 1  (1.0/sec)







Bill Moseley wrote:

>On Wed, Feb 02, 2005 at 04:04:36PM -0500, Juan Carlos Avila / MTBASE wrote:
>  
>
>>Hi Bill,
>>
>>Yes, I'm monitoring my web server's log and while swish-e shows status 
>>500, my web server shows 200.
>>
>>I do not understand what you mean by "running the spider with:  
>>SPIDER_DEBUG=...." -- I'm quite new to swish-e... sorry.
>>    
>>
>
>Sorry, I'm used to setting environment vars at the command line.
>I spent last weekend working on windows XP and aged about five years.
>
>Create config.txt:
>
>@servers = ( {
>    base_url => 'http://your_server/casos/VerCasoIdx?caso_numero=6896',
>    debug    => 'headers, url, skipped',
>    max_files => 1,
>    email     => 'you@yourmail.whatever',
>} );
>
>Then run the spider directly -- I don't know where it's installed on
>your machine, but this is what I would do:
>
>    perl /usr/local/lib/swish-e/spider.pl config.txt > output
>
>which generates this:
>
>
>
>/usr/local/lib/swish-e/spider.pl: Reading parameters from 'config.txt'
>
> -- Starting to spider: http://localhost/index.html --
>Request for 'http://localhost/index.html' aborted because: 'dead at /usr/local/lib/swish-e/spider.pl line 688.'
>
>Summary for: http://localhost/index.html
>Connection: Close: 1  (1.0/sec)
>          Skipped: 1  (1.0/sec)
>      Unique URLs: 1  (1.0/sec)
>moseley@bumby:~$ perl /usr/local/lib/swish-e/spider.pl config.txt > output
>/usr/local/lib/swish-e/spider.pl: Reading parameters from 'config.txt'
>
> -- Starting to spider: http://localhost/index.html --
>
>vvvvvvvvvvvvvvvv HEADERS for http://localhost/index.html vvvvvvvvvvvvvvvvvvvvv
>
>---- Request ------
>GET http://localhost/index.html
>Accept-Encoding: gzip; deflate
>From: me@inalid.com
>User-Agent: swish-e spider 2.2 http://swish-e.org/
>
>
>---- Response ---
>Status: 200 OK
>Connection: close
>Date: Wed, 02 Feb 2005 21:16:49 GMT
>Accept-Ranges: bytes
>ETag: "1c0140c-100f-3ffc3496"
>Server: Apache/1.3.33 (Debian GNU/Linux) PHP/4.3.9-2 mod_ssl/2.8.22 OpenSSL/0.9.7d mod_perl/1.29
>Content-Length: 4111
>Content-Type: text/html; charset=iso-8859-1
>Content-Type: text/html; charset=iso-8859-1
>Last-Modified: Wed, 07 Jan 2004 16:32:22 GMT
>Client-Date: Wed, 02 Feb 2005 21:16:49 GMT
>Client-Peer: 127.0.0.1:80
>Client-Response-Num: 1
>Title: Welcome to Your New Home Page!
>X-Meta-Author: johnie@debian.org (Johnie Ingram)
>X-Meta-Description: The initial installation of Debian/GNU Apache.
>X-Meta-GENERATOR: Mozilla/4.05 [en] (X11; I; Linux 2.3.99-pre3 i686) [Netscape]
>
>^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>  
>
>>>+Fetched 0 Cnt: 1 GET  http://localhost/index.html  200 OK text/html 4111 parent: depth:0
>>>      
>>>
>sleeping 5 seconds
>/usr/local/lib/swish-e/spider.pl: Max files Reached
>
>Summary for: http://localhost/index.html
>Connection: Close:     2  (0.4/sec)
>   Off-site links:    10  (2.0/sec)
>      Total Bytes: 4,111  (822.2/sec)
>       Total Docs:     1  (0.2/sec)
>      Unique URLs:     2  (0.4/sec)
>
>
>So that shows you exactly what the server is sending back.  If that
>says 500 and your logs say 200 then maybe:
>
>1) you are looking at the wrong longs
>2) you web server is telling you a lie
>3) spider.pl or LWP::UserAgent/LWP::RobotUA is generating a 500
>but I can't think of why it would do that....
>
>
>
>  
>

---------------------------------------------------------------------------

Ya está disponible Sybase ASE Express Edition para Linux, la única base de
datos comercial de clase empresarial gratuita para desarrollo y producción.
Más información en http://www.mtbase.com/linux/promocion

---------------------------------------------------------------------------

Este mensaje, y cualquier archivo que se adjunte al mismo es confidencial y
podría contener información privilegiada y reservada de MTBASE S.A. y
Sybase, Inc, para el uso exclusivo de su destinatario. Si usted ha recibido
este mensaje por error le solicitamos comedidamente avisarnos, abstenerse
de divulgarlo en cualquier forma, y proceder a borrar su contenido de
inmediato. Los sistemas de MTBASE S.A. son revisados con programas anti-
virus periódicamente, no obstante, el destinatario debe examinar el mensaje,
y MTBASE  S.A. no se hace responsable en ningún caso por daños derivados de
la recepción del presente mensaje.
Received on Wed Feb 2 13:33:39 2005