At 11:42 PM 2/20/2002 -0800, Cristiano Corsani wrote:
>Now i would like to try spidering. I search for a while on
>documentation and discussion and it is not so clear how it
>works. I search also in source (http.c) and I saw that swish-e
>need perl script spider.pl that is not distributed with
It should be in the Win32 distribution. Look in the prog-bin directory.
Also, search the archives. I think I posted an example session on how to
index on Windows with spider.pl just a week or so ago.
>Are you considering the possibility of writing a c version of
Not considering it at all!
>Perl is not so used on win32
But it's easy to get and install. http://activestate.com. Of course,
linux isn't that hard to install, either. ;)
>... Well you can say
>that I can write by my self a c spider program that I can
>use with -S prog argument. it is not a simple thing to do :-)
Which is why Perl is used.
>Can you suggest somehing?
You are going to need perl and the libwww-perl (LWP) libraries installed
regardless of the spidering method. -S http uses a perl program to fetch
the document and extract out the links.
Now, with libxml2 now installed in swish, you could write an entire spider
in C and have it built into swish. zlib can be linked in to uncompress
documents, I'm sure there's MD5 C libraries to add that kind of feature
internally to swish. It's just so much easier to write and maintain that
kind of thing in Perl than in C. But if someone wants to write one in
swish and add it in, that would be great, I'd think.
Received on Thu Feb 21 14:10:56 2002