Skip to main content.
home | support | download

Back to List Archive

Re: [Ignor Abuse] Swish Spider Configuration

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Feb 11 2004 - 18:09:54 GMT
On Wed, Feb 11, 2004 at 05:34:48PM +1030, Ahmad, Zeeshan (FMC) wrote:
> I have the spider config file setup with IndexDir set to my web server and
> that works fine. I want to modify SwishSpiderConfig.pl to use test_url
> function. I think I need to muck around with @servers hash in order to that.

@servers is an array (of hashes).

> Should BaseUrl entry be same as the IndexDir entry?

No.  With -S prog IndexDir is the name of the program to run.

> How does spider figure
> out which hash entry to use for my IndexDir URL?

It doesn't work that way.

IndexDir tells swish-e what program to run.  It's up to the program to
decide what to index -- it could be web pages (spider.pl) or files in
the directory tree (DirTree.pl) or HTML generated by hypermail
(index_hypermail.pl) or data in a database (MySQL.pl).  All swish-e
knows is it is running the program (or programs) you specify with -i or
with IndexDir.

> Would delay_sec entry
> override Delay entry in spider config file? 

Delay doesn't go in spider config, it goes in the swish-e config file.
And no, it doesn't override because they are fro two different indexing
modes.  The Delay option is when running -S http, and delay_sec is an
option in spider.pl and thus only used with running the spider.pl
program with -S prog.


> How does it all work?

Mirrors and magic.

-- 
Bill Moseley
moseley@hank.org
Received on Wed Feb 11 10:09:55 2004