Skip to main content.
home | support | download

Back to List Archive

[swish-e] Limiting content from spider.pl

From: <mitch-swish(at)not-real.claborn.net>
Date: Thu Mar 29 2007 - 19:18:28 GMT
I want to eliminate some portions of the pages on our site from indexing -
I've marked them in the HTML with specially formatted HTML comments.  

The way I made it work was to add this code at the very top of
output_content in spider.pl (V 1.26):

if ( my $fn = $server->{alter_content} ) {
    eval {
        $fn->($server, $content, $uri, $response); 
    };
    die "alter_content died for $uri: $@\n" if $@;
}

Is this a good way to accomplish it?  I put my actual logic in the config
file of course.  

I could have also used the existing output_function callback, but there is a
lot of miscellaneous stuff that happens after that call before the output
that I would have to replicate in my code if I did so.




Mitch Claborn

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Mar 29 15:18:13 2007