Mark Gaulin wrote:
> I think there may be a middle ground where both methods can be
> used... mime to include urls and file extension to exclude the
> common extensions.
Yes, that exactly what I mean :
1) a possibility to configure SWISH to immediatly ignore some files
based on their extension. (so that it won't do un-needed fork
and connection to the server for most of the case. After all,
just putting ".jpg" and ".gif" will remove the majority of unwanted
document).
Another reason to this is to avoid "hand" if some file realy
cause a problem for whatever reason. I simply couldn't index
my site because of a large "mov" file until I did the change
to let it skip over it...
2) Naturaly, the spider itself has to keep checking the mime-type
from the document header. (and abort without transfering the
whole document if the type is not correct).
Don't forget that SWISH is doing it's own check after the spider.
So if it discard anything not of "text/*" type, it's perfectly
alright to discard it directly in the spider... (It is currently
but only after having transfered the whole file)
Cheers,
Yann Stettler
--
-------------------------------------------------------------------
TheNet - Internet Services AG CohProg SaRL
stettler@thenet.ch stettler@cohprog.com
http://www.thenet.ch/ http://www.cohprog.com/
---**---
Anime and Manga Services http://www.animanga.com/
Received on Tue Jan 19 12:33:19 1999