Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Limiting content from spider.pl

From: <mitch-swish(at)not-real.claborn.net>
Date: Thu Mar 29 2007 - 22:44:57 GMT
    <!-- noindex -->
    <!-- index -->

Were what I was looking for.  Don't know how many times I looked through the
doc and didn't see those!  Thanks.

  _____  

Mitch Claborn



-----Original Message-----
From: users-bounces@lists.swish-e.org
[mailto:users-bounces@lists.swish-e.org] On Behalf Of Bill Moseley
Sent: Thursday, March 29, 2007 4:55 PM
To: Swish-e Users Discussion List
Subject: Re: [swish-e] Limiting content from spider.pl


On Thu, Mar 29, 2007 at 02:18:28PM -0500, mitch-swish@claborn.net wrote:
> I want to eliminate some portions of the pages on our site from 
> indexing - I've marked them in the HTML with specially formatted HTML
comments.

Ignore parts of pages or the entire pages?  To ignore parts of pages (e.g.
menus, headers, footers) you can use these comments:

    <!-- noindex -->
    <!-- index -->


> The way I made it work was to add this code at the very top of 
> output_content in spider.pl (V 1.26):
> 
> if ( my $fn = $server->{alter_content} ) {
>     eval {
>         $fn->($server, $content, $uri, $response); 
>     };
>     die "alter_content died for $uri: $@\n" if $@;
> }
> 
> Is this a good way to accomplish it?  I put my actual logic in the 
> config file of course.

Sure -- you can hack it to fit your needs.

> I could have also used the existing output_function callback, but 
> there is a lot of miscellaneous stuff that happens after that call 
> before the output that I would have to replicate in my code if I did 
> so.

That's rather late in the process -- what happens after that makes that not
useful?

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Mar 29 18:44:40 2007