Skip to main content.
home | support | download

Back to List Archive

Re: Spidering under win32

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Feb 21 2002 - 14:10:19 GMT
At 11:42 PM 2/20/2002 -0800, Cristiano Corsani wrote:

>Now i would like to try spidering. I search for a while on
>documentation and discussion and it is not so clear how it
>works. I search also in source (http.c) and I saw that swish-e
>need  perl script spider.pl that is not distributed with 
>win32 version.

It should be in the Win32 distribution.  Look in the prog-bin directory.

Also, search the archives.  I think I posted an example session on how to
index on Windows with spider.pl just a week or so ago.

>Are you considering the possibility of writing a c version of
>spider? 

Not considering it at all!

>Perl is not so used on win32 

But it's easy to get and install.  http://activestate.com.  Of course,
linux isn't that hard to install, either. ;)


>... Well you can say
>that I can write by my self a c spider program that I can
>use with -S prog argument. it is not a simple thing to do :-)

Which is why Perl is used.

>Can you suggest somehing?

You are going to need perl and the libwww-perl (LWP) libraries installed
regardless of the spidering method.  -S http uses a perl program to fetch
the document and extract out the links.


Now, with libxml2 now installed in swish, you could write an entire spider
in C and have it built into swish.  zlib can be linked in to uncompress
documents, I'm sure there's MD5 C libraries to add that kind of feature
internally to swish.  It's just so much easier to write and maintain that
kind of thing in Perl than in C.  But if someone wants to write one in
swish and add it in, that would be great, I'd think.

Bill Moseley
mailto:moseley@hank.org
Received on Thu Feb 21 14:10:56 2002