Skip to main content.
home | support | download

Back to List Archive

Re: no_index and querystrings

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Oct 11 2004 - 20:03:46 GMT
On Mon, Oct 11, 2004 at 12:53:27PM +1000, Tim Hartley wrote:
> The querystrings that I wish to run through the 'no_index' callback
> function all start with
> "www.planetpdf.com/tools.asp?webpageid=615&SearchType=Product".

>         test_response => sub {
>         my $server = $_[1];
> 	#try and not index tools.asp?webpageid=615&SearchType=Product..	
>       $server->{no_index}++ if $_[0]->path =~ /tools\.asp\?webpageid\=615\&SearchType\=Product$/;
> 	$server->{no_index}++ if $_[0]->path =~ /\?webpageid\=615\&SearchType\=Product$/;
> 	$server->{no_index}++ if $_[0]->query_form =~ /\?webpageid\=615\&SearchType\=Product$/;
> 	$server->{no_index}++ if $_[0]->query_form =~ /tools\.asp\?webpageid\=615\&SearchType\=Product$/;
> 	$server->{no_index}++ if $_[0]->path_query =~ /\?webpageid\=615\&SearchType\=Product$/;
> 	$server->{no_index}++ if $_[0]->path_query =~ /tools\.asp\?webpageid\=615\&SearchType\=Product$/;
> 	$server->{no_index}++ if $_[0]->query =~ /\?webpageid\=615\&SearchType\=Product$/;
> 	$server->{no_index}++ if $_[0]->query =~ /tools\.asp\?webpageid\=615\&SearchType\=Product$/;
> 	$server->{no_index}++ if $_[0]->query_keywords=~ /\?webpageid\=615\&SearchType\=Product$/;
> 	$server->{no_index}++ if $_[0]->query_keywords=~ /tools\.asp\?webpageid\=615\&SearchType\=Product$/;
> 	$server->{no_index}++ if $_[0]->path_segments=~ /\?webpageid\=615\&SearchType\=Product$/;
> 	$server->{no_index}++ if $_[0]->path_segments=~ /tools\.asp\?webpageid\=615\&SearchType\=Product$/;
> 	$server->{no_index}++ if $_[0]->path_query=~ /\?webpageid\=615\&SearchType\=Product$/;
> 	$server->{no_index}++ if $_[0]->path_query=~ /tools\.asp\?webpageid\=615\&SearchType\=Product$/;

Looks like you are you confused about what the different methods do.

man URI will help.  But you can also do this:

    test_response => sub {
        my ( $uri, $server ) = @_;
        print STDERR "processing URI: [$uri]\n",
                     "Path: [", $uri->path, "] ",
                     "Query: [", $uri->query, "]\n";

Then you can see what you *can* compare against.

For example, you can't use $uri->path to test the query string.

Now, $uri->query isn't probably the best method to use -- as it's not
unescaped for you.  That may be fine in many cases but just be aware
it's just the query part of the URI.  It's better to work with the
decoded data in the general case.

So, you can actually get more helpful methods by doing this:


use URI::QueryParam;

[...]

    test_response => sub {
        my ( $uri, $server ) = @_;

        my $id = $uri->query_param('webpageid');
        $server->{no_index}++ if $id && $id == 615;
        [... and so on ...]
    },

But that might be more detail than you want.  But then you don't have
to worry about the order of the parameters in the actual query string.

Next, if you use Windows you likely have worn out that backslash key,
so here's a trick for your regular expressions.  Instead of:

    $uri->path =~ /\?webpageid\=615\&SearchType\=Product$/

try:

    $uri->path =~ /\Qwebpageid=615&SearchType=Product/

But the key suggestion above is use the print statement to see what
you are comparing.


-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Mon Oct 11 13:03:55 2004