Skip to main content.
home | support | download

Back to List Archive

Re: test & rewrite?

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Nov 11 2005 - 19:54:46 GMT
On Fri, Nov 11, 2005 at 11:33:22AM -0800, François Tissandier wrote:
> Hello
> 
> I have a tricky problem I can't solve:
> 
> I have pages like this:
> www.blabla.com/tutu/index.html
> www.blabla.com/index.html
> 
> Basically, it's the same page, except for some content that I don't want
> to index. So I want those two pages to be indexed as one.

You can tell swish that they are the same file name, but search
results will still return both as separate files (although named the
same).

> Sadly, the content is slightly different, so the MD5 function is not
> working. So I was wondering if it would be possible to give some rules to
> the spider to tell him 
> 
> "link "/tutu/index.html" is OK, but follow "/index.html" instead, please,
> will you?"

Not really clear what you want.  The spider grabs links and stuffs
them in an array to be processed (unless rejected by test_url).  Then
when processing those links I think you could modify the URI object in
test_response or filter_content and that would modify the file name
passed to swish.   Just have to try -- or better try and look at the
code and see what happens.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Fri Nov 11 11:54:47 2005