Hello,
Sorry if I am saying something totaly stupid but it's the
first time I used Swish...
Isn't "swishspider" totaly inefficient ? I mean, if there
is HTML link in a page that point directly to binary files,
swishspider will still try to download the whole stuff
while swish will discard it anyway because it's not of
a "text/" type... The result is that we may well end-up
trying to download MB and MB of video, picture, archive
files to no purpose at all... or did I missed something ?
Not counting that as the reponse to the request is
stored in the response object, it may start to use
quite a lot of memory quickly.
Instead of doing a "GET" request, we could do first
a "HEAD", check if the document is of "text/" type
as swish will only process those anyway and then
do the GET if needed. But that mean doing 2 requests
for each documents of type "text/" instead of only
one...
Wouldn't it makes a lot of sense to implement
the "NoContents" directive for the HTTP method
in the exact way as it's implement for the
filesystem method ? So that no request would
be needed at all to discard "jpg", "gif", "gz"
and others "mp3"... document ? (and without
exec'ing the swishspider perl script)
I am pretty sure that the gain in performance,
especialy when using the HTTP method to index
the localhost would be pretty impressive...
Cheers,
Yann Stettler
--
-------------------------------------------------------------------
TheNet - Internet Services AG CohProg SaRL
stettler@thenet.ch stettler@cohprog.com
http://www.thenet.ch/ http://www.cohprog.com/
---**---
Anime and Manga Services http://www.animanga.com/
Received on Tue Dec 8 12:34:43 1998