Re: Change queue URL in test_url

From: Bill Moseley <moseley(at)>
Date: Tue May 25 2004 - 19:18:47 GMT
On Tue, May 25, 2004 at 12:06:26PM -0700, Justin Tang wrote:
> Hi:
>   I was wondering if there is any way to change the URL that is about to be
> queued using a call back function in test_url.  Specifically, say if I have
> to be placed in the queue, and I want to change it to
> how can I change the URL that is being passed back?  Thanks!

I think so.  Try something like:

sub remove_query {
    my ( $uri ) = @_;
    $uri->query( undef )
        if $uri->path eq '/page.html';

    return 1;

then in your spider config

    test_url => \&remove_query,

(I think you can specify more than one function like this, if you needed
to do so:

    test_url => [ \&remove_query, \&other_subroutine ],

$uri is a URI object.  perldoc URI to see how you can mess with it.

Note that after test_url is checked, then checks if
$uri->canonical has been visited before.  So if you do the above it will
only be visited once.

Bill Moseley
