Skip to main content.
home | support | download

Back to List Archive

Re: Does the <!-- Swishcommand noindex --> work whe

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Jun 25 2003 - 14:06:47 GMT
On Wed, Jun 25, 2003 at 08:29:17AM -0500, Cleveland@mail.winnefox.org wrote:
> > If you don't want to index a page then use robots.txt or a 
> > meta robots tag to say don't follow links.
> 
> What we have is a directory of pdf files. There are about 10 of them we
> don't want indexed, but they are linked on a browse.html page that has
> links to all the files. I don't know much about pdf files. Is there a
> way to put meta tags in them?

As in <meta> noindex tags?  No, that's HTML.  (PDFs can have meta data 
associated with them, though, so I suppose there might be a way to 
check the PDF after converting it to text/html.)

If what you want is to avoid indexing the pdf files in a directory then 
I'd probably use a robots.txt file.  Unfortunately, you cannot use 
regular expressions in that file.

<web_root>/robots.txt:
----------------------
User-agent: *

# don't allow spidering in the /pdfdocs direcotry
Disallow: /pdfdocs/

# don't allow spidering of these specific files:
Disallow: /otherpdfdir/pdfdoc.pdf
Disallow: /otherpdfdir/pdfdoc2.pdf

Or if you want more control, just add tests in the "test_url" callback 
function.  Then you can use regular expressions.

    return if $uri->path =~ m[^/pdfdocs/\w+\.pdf$];

Will either of those work for you?


-- 
Bill Moseley
moseley@hank.org
Received on Wed Jun 25 14:06:53 2003