Skip to main content.
home | support | download

Back to List Archive

Re: Does the <!-- Swishcommand noindex --> work whe

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sat Jun 21 2003 - 14:02:26 GMT
On Wed, Jun 18, 2003 at 01:27:58PM -0700, Cleveland@mail.winnefox.org wrote:
> > When in doubt... test!
> > 
> > moseley(at)not-real.bumby:~/apache$ GET http://localhost/apache/noindex.html
> 
> > http://localhost/apache/noindex.html -T indexed_words -v0
> 
> It all works right. Until you add in a <a href> tag pointing to another
> file. What it does is, it'll skip the word, but still follow that link
> and also index the words in the file it's linked to. I remove the link,
> and it ignores that other file. Is there something I'm missing? 

Yes, you are confusing the function of swish-e vs. the spider.  The 
spider decides what files to send to swish-e (and thus what 
links to follow).  The "noindex" just tells swish-e to ignore *indexing* 
the content between the tags.  In other words, that noindex tag tells 
swish-e not to index the content between the tags, but there's no way 
for swish-e to tell the spider to ignore links found in that noindex 
section.

If you don't want to index a page then use robots.txt or a meta robots 
tag to say don't follow links.

The spider extracts links *before* calling the filter content function, 
so you can't use that to remove links.  Perhaps the order of processing 
could be changed so that you could simply modify the content (e.g. 
remove all content between two comments).  I'll have to look when I get 
back.  My wireless connection isn't working well here:
  http://www.forwolves.org/ralph/wpages/graphics/little-redfish-lk2.jpg



-- 
Bill Moseley
moseley@hank.org
Received on Sat Jun 21 14:02:31 2003