Skip to main content.
home | support | download

Back to List Archive

Re: HTTP method and swishspider

From: Didier Bringer <dbringer(at)>
Date: Mon Sep 25 2000 - 15:34:32 GMT

I tried once to use the spider but was not really happy with the speed.
>From this time I use httrack which is very fast to make a mirror and
then I index with swish
(which is very very nice)

Hereunder is the kind of command line httrack can accept

httrack          -%F "" -C2 -*.html?* -*.gif
-*.jpeg -*.jpg -c48 -D -z -w -O /home/httpd/www/web_fige/

for example it doesn't copy the jpeg and gif
It also doesn't take pages that looks like  index.html?p=1  (with this
kind of command line)

It does an exelent job for me wrote:
> Hi all,
> I have never used HTTP feature before, but finally I have used it to
> check's Bryan's problem with swishspider (read previous posts).
> I have noticed that this option is slow. I am wondering why. As you
> know, an external perl program is called for getting each page from
> the server. Obviously, each time swishspider is called, a perl
> interpreter must to be loaded in memory. It also needs to load the
> program and the required modules. The install of the required perl
> modules is also tedious (Digest-MD5, libnet, libwww-perl, HTML-
> Parser, HTML-Tagset, MIME-Base64, URI) or perhaps I did not it
> correctly.
> I am wondering if there is a way to avoid the use of  swishspider. I
> saw a reference to libwww in the discussion list (from Mark Gaulin). I
> do not know if the effort worths it.
> Any comments?
> cu
> Jose
Received on Mon Sep 25 15:34:58 2000