Skip to main content.
home | support | download

Back to List Archive

Re: LWP,HTTP and HTML modules

From: Yann Stettler <stettler(at)not-real.cohprog.com>
Date: Tue Jan 19 1999 - 20:42:08 GMT
Mark Gaulin wrote:

> I think there may be a middle ground where both methods can be
> used... mime to include urls and file extension to exclude the
> common extensions.

Yes, that exactly what I mean :

1) a possibility to configure SWISH to immediatly ignore some files
   based on their extension. (so that it won't do un-needed fork
   and connection to the server for most of the case. After all,
   just putting ".jpg" and ".gif" will remove the majority of unwanted
   document).

   Another reason to this is to avoid "hand" if some file realy
   cause a problem for whatever reason. I simply couldn't index
   my site because of a large "mov" file until I did the change
   to let it skip over it...

2) Naturaly, the spider itself has to keep checking the mime-type
   from the document header. (and abort without transfering the
   whole document if the type is not correct).

Don't forget that SWISH is doing it's own check after the spider.
So if it discard anything not of "text/*" type, it's perfectly
alright to discard it directly in the spider... (It is currently
but only after having transfered the whole file)

Cheers,
Yann Stettler

-- 
-------------------------------------------------------------------
TheNet - Internet Services AG              CohProg SaRL
stettler@thenet.ch                         stettler@cohprog.com
http://www.thenet.ch/                      http://www.cohprog.com/
                              ---**---
Anime and Manga Services                   http://www.animanga.com/
Received on Tue Jan 19 12:33:19 1999