At 12:03 PM 06/09/01 -0700, Paul Thomas wrote:
>I have some web browseable archives on my web server. I use
>a robot.txt file to designate off-limits web directories and
>also tried protecting private archive directories with .htaccess
>files. However there are still some spiders that just come
>through and gobble everything up anyways.
You have .htaccess configured incorrectly if people are able to spider what
you think are protected directories and files.
You can block specific IPs or blocks of IPs, you can contact their upstream
provider (if not hiding behind a proxy). And there are various throttling
modules for Apache that will attempt to detect high load from spiders or
misbehaved programs such as Internet Explorer. Or you can install more
memory and faster disks.
There's currently a discussion on the mod_perl list about this topic.
Received on Sat Jun 9 22:28:54 2001