RE: Getting the right files indexed the right way

From: Rob de Santos AFANA <rdesantos(at)>
Date: Thu Jan 29 2004 - 01:39:41 GMT
Thanks, Bill... that worked like a charm.   

One other problem has cropped up.  We have a picture gallery application
done in php ("Gallery") which allows users to view sets of pictures.
The gallery directory itself is off limits to robots via robots.txt but
the pics are in subdirectories of a directory called /album/.  That is
open to spidering but for some reason swish-e doesn't index the
subdirectories of /album/.  I tried this in the swish-e config file but
it didn't work:

obeyRobotsNoIndex yes

So, I'm guessing that if you have to set the SwishSpiderConfig location
by an absolute path then the URL's are ignored?

Aside: this is one reason why it would be so much nicer if the
robots.txt standards had an "allow" parameter but it doesn't...   


> No, the is_binary() method simply checks if the content type 
> is not text/*.  If you just want images then do something like
>     if ( $content_type =~ /^image/ ) {
>         $$content_ref = $uri;
>         return 1;
>     }
