Skip to main content.
home | support | download

Back to List Archive

Swishspider totaly inefficient ???

From: Yann Stettler <stettler(at)>
Date: Tue Dec 08 1998 - 20:37:01 GMT
Sorry if I am saying something totaly stupid but it's the
first time I used Swish...

Isn't "swishspider" totaly inefficient ? I mean, if there
is HTML link in a page that point directly to binary files,
swishspider will still try to download the whole stuff
while swish will discard it anyway because it's not of
a "text/" type... The result is that we may well end-up
trying to download MB and MB of video, picture, archive
files to no purpose at all...  or did I missed something ?

Not counting that as the reponse to the request is
stored in the response object, it may start to use
quite a lot of memory quickly.

Instead of doing a "GET" request, we could do first
a "HEAD", check if the document is of "text/" type
as swish will only process those anyway and then
do the GET if needed. But that mean doing 2 requests
for each documents of type "text/" instead of only

Wouldn't it makes a lot of sense to implement
the "NoContents" directive for the HTTP method
in the exact way as it's implement for the
filesystem method ? So that no request would
be needed at all to discard "jpg", "gif", "gz"
and others "mp3"... document ? (and without
exec'ing the swishspider perl script)
I am pretty sure that the gain in performance,
especialy when using the HTTP method to index
the localhost would be pretty impressive...

Yann Stettler

TheNet - Internet Services AG              CohProg SaRL                           
Anime and Manga Services         
Received on Tue Dec 8 12:34:43 1998