Thanks for you response,
Just using the scheme in test_url didn't do the job.
What I noticed already is that the file: URLs never show up in the debugging
output. Using your pointer toward the scheme I looked in spider.pl and finally
got to subroutine check_link. In this subroutine there is a block that's
checking the url->scheme with server->scheme. If these don't match the URL will
be validated, causing the script to check existance of the file (in case of file
scheme). This is exactly what I don't want.
After crudely changing this part of the script to skip validate_link if the
scheme is "file" spider.pl is functioning exactly as I wanted it to (at least
for my test environment).
$ diff spider.pl_org spider.pl
74a75
> skip_scheme
1307c1308,1310
< validate_link( $server, $u, $base ) if $server->{validate_links};
---
> unless ( $u->scheme eq $server->{skip_scheme} ) {
> validate_link( $server, $u, $base ) if $server->{validate_links};
> }
And added to the config file:
skip_scheme => 'file',
Using an array for skip_scheme would even be better, but for me it works.
Thanks for the pointer,
Erik.
Quoting Bill Moseley <moseley@hank.org>:
> On Fri, Nov 02, 2007 at 11:35:11AM +0100, Erik van Duren wrote:
> > test_url => sub { $_[0]->canonical !~ /file:\/\//i },
> > Without result however:
>
> I didn't test this but I'd probably use:
>
> sub { $_[0]->scheme != 'file' }
>
> But test to make sure that $_[0]->scheme returns "file".
>
> --
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
> http://swish-e.org/Discussion/
>
> Help with Swish-e:
> http://swish-e.org/current/docs
>
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
>
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Nov 2 12:36:37 2007