Skip to main content.
home | support | download

Back to List Archive

Re: spider a database

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sat Nov 05 2005 - 23:18:59 GMT
On Sat, Nov 05, 2005 at 11:09:43AM -0800, Michael Porcaro wrote:
> Ok I think I am understanding now. I was confused because I didn't
> realize there are 2 different configuration files.  One for parameters
> which is much simpler (swish.conf) and another for spider.pl, which
> requires perl knowledge (a perl config file).  So there are 2 config
> files, am I correct on this?

Kind of.  The spider's job is to fetch documents form websites.  So,
as you might expect, there's a config file to tell the spider what
urls to spider, which to skip, and maybe how to filter non-text
documents.  The spider output the files it fetches in a format that's
read by swish.

Swish-e's job is to take documents and parse out the words and index
them.  So there's a config file for controlling how swish deals with
its input.

If you want to think of both of those activities as one thing with two
config files, that's up to you.

> Finally, where is this custom config perl file supposed to go?  Under
> what directory?  I tried running it in my cgi-bin (local website) but it
> didn't work.

Interesting.  So, why do you think the configuration file is a cgi
script?

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Sat Nov 5 15:19:12 2005