Re: spider a database

From: Bill Moseley <moseley(at)>
Date: Sat Nov 05 2005 - 23:18:59 GMT
On Sat, Nov 05, 2005 at 11:09:43AM -0800, Michael Porcaro wrote:
> Ok I think I am understanding now. I was confused because I didn't
> realize there are 2 different configuration files.  One for parameters
> which is much simpler (swish.conf) and another for, which
> requires perl knowledge (a perl config file).  So there are 2 config
> files, am I correct on this?

Kind of.  The spider's job is to fetch documents form websites.  So,
as you might expect, there's a config file to tell the spider what
urls to spider, which to skip, and maybe how to filter non-text
documents.  The spider output the files it fetches in a format that's
read by swish.

Swish-e's job is to take documents and parse out the words and index
them.  So there's a config file for controlling how swish deals with
its input.

If you want to think of both of those activities as one thing with two
config files, that's up to you.

> Finally, where is this custom config perl file supposed to go?  Under
> what directory?  I tried running it in my cgi-bin (local website) but it
> didn't work.

Interesting.  So, why do you think the configuration file is a cgi

Bill Moseley

Received on Sat Nov 5 15:19:12 2005