I tried once to use the spider but was not really happy with the speed.
>From this time I use httrack which is very fast to make a mirror and
then I index with swish
(which is very very nice)
Hereunder is the kind of command line httrack can accept
httrack http://127.0.0.1/web5/ -%F "" -C2 -*.html?* -*.gif
-*.jpeg -*.jpg -c48 -D -z -w -O /home/httpd/www/web_fige/
for example it doesn't copy the jpeg and gif
It also doesn't take pages that looks like index.html?p=1 (with this
kind of command line)
It does an exelent job for me
> Hi all,
> I have never used HTTP feature before, but finally I have used it to
> check's Bryan's problem with swishspider (read previous posts).
> I have noticed that this option is slow. I am wondering why. As you
> know, an external perl program is called for getting each page from
> the server. Obviously, each time swishspider is called, a perl
> interpreter must to be loaded in memory. It also needs to load the
> program and the required modules. The install of the required perl
> modules is also tedious (Digest-MD5, libnet, libwww-perl, HTML-
> Parser, HTML-Tagset, MIME-Base64, URI) or perhaps I did not it
> I am wondering if there is a way to avoid the use of swishspider. I
> saw a reference to libwww in the discussion list (from Mark Gaulin). I
> do not know if the effort worths it.
> Any comments?
Received on Mon Sep 25 15:34:58 2000