Hi Bill
Thanks for your reply. It has been quite useful. Although I don't quite
follow what you are suggesting.
We are running Perl 5.8.5 which is from late last year so i don't think
that is the problem.
I have run up a test.cgi like yours, and it looks as if it confirms my
theory. A standard 302 redirect, which is fully qualified, as served up
by Apache works fine, but if i use a cgi with an unqualified URI
attached to the 302 header, i get the same error again:
Skipping /index.html: Wrong method or server.
I looked at the swish-e source code to narrow it down to this.
Inside the add_url function in http.c is the relevant error:
if (!equivalentserver(sw, url, baseurl))
{
if (sw->verbose >= 3)
printf("Skipping %s: %s\n", url, "Wrong method or server.");
}
and the output from the spider i am trying to run is
Skipping /Default.asp?c=1298: Wrong method or server.
As you can deduce from the error message, the variable 'url' must only
contain '/Default.asp?c=1298'
If you look at the equivalentserver function in httpserver.c, it really
needs url to be 'http://something.com/Default.asp?c=1298'. If it can't
extract a method from url, it returns 0.
ie (line 386 of httpserver.c of v2.5.2)
int equivalentserver(SWISH *sw, char *url, char *baseurl)
{
char *method;
int methodlen;
char *serverport;
int serverportlen;
char *basemethod;
int basemethodlen;
char *baseserverport;
int baseserverportlen;
struct multiswline *walk=NULL;
struct MOD_HTTP *http = sw->HTTP;
method = url_method(url, &methodlen);
serverport = url_serverport(url, &serverportlen);
basemethod = url_method(baseurl, &basemethodlen);
baseserverport = url_serverport(baseurl, &baseserverportlen);
if (!method || !serverport || !basemethod || !baseserverport) {
return 0;
}
...
[url_method, in which is in http.c, just gets whatever is before '://'
from url - ie it should contain 'http' or 'ftp' or 'https' etc]
Sorry there is so much code here, I am just hope I am not missing something.
Thanks in advance for any more advice
Francis
Bill Moseley wrote:
> On Sun, Jul 31, 2005 at 07:47:38PM -0700, Francis Vierboom wrote:
>
>>The problem seems to be that the 302 response only contains the relative
>>URI. A correctly formed 302 response has an absolute URI - ie it should be
>>
>>302. http://www.site.com/Default.asp?c=1298
>>instead of just
>>302. /Default.asp?c=1298
>
>
> That's lame.
>
>
>>Can anyone confirm that swish-e does in fact have this problem, and if
>>so, maybe swish-e should consider honouring these sloppy redirects?
>
>
> Maybe you need to updated your Perl modules?
>
> Here's a broken redirect:
>
> moseley@bumby:~/apache$ cat test.cgi
> #!/usr/bin/perl
> print "Status: 302 Moved\nLocation: /apache/index.html\n\n";
>
> Here's the file it redirects to:
>
> moseley@bumby:~/apache$ cat index.html
> <html>
> <head><title>index.html</title>
> </head>
> <body>
> This is index.html
> </body>
> </html>
>
> Here's LWP's GET method following the redirect:
>
> moseley(at)not-real.bumby:~/apache$ GET -S http://localhost/apache/test.cgi
> GET http://localhost/apache/test.cgi --> 302 Moved
> GET http://localhost/apache/index.html --> 200 OK
> <html>
> <head><title>index.html</title>
> </head>
> <body>
> This is index.html
> </body>
> </html>
>
> The spider uses LWP, so also follows the link:
>
> moseley(at)not-real.bumby:~/apache$ /usr/local/lib/swish-e/spider.pl default http://localhost/apache/test.cgi
> /usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'
> Path-Name: http://localhost/apache/index.html
> Content-Length: 89
> Last-Mtime: 1122875111
> Document-Type: html*
>
> <html>
> <head><title>index.html</title>
> </head>
> <body>
> This is index.html
> </body>
> </html>
>
>
>
--
-----------------------------------
Francis Vierboom
francis@galexia.com
Research Consultant
Galexia Consulting Pty Ltd
Suite 95, Jones Bay Wharf,
(Lower Deck, East Side)
26-32 Pirrama Road,
Pyrmont NSW 2009,
Australia
tel: +61 (0)2 9660 1111
fax: +61 (0)2 9660 7611
Received on Mon Aug 1 00:19:26 2005