Skip to main content.
home | support | download

Back to List Archive

Problem with spider?

From: Bruce Bowler <bbowler(at)not-real.bigelow.org>
Date: Tue Jan 26 1999 - 20:46:41 GMT
Hi,

I'm a swish-e newbie so forgive me.  I searched the archive but didn't find
anything that looked relevant.  Maybe my expectations are off...

I run swish-e as follows....

# /usr/local/bin/swish-e -S http -c bcb.config
Indexing Data Source: "HTTP-Crawler"
retrieving http://www.bigelow.org/ (0)...
 (122 words)

Removing very common words... no words removed.
Writing main index... 96 unique words indexed.
Writing file index... 1 file indexed.
Running time: 1 minute, 7 seconds.
Indexing done!
#

It's possible that there are 122 words on the main page, but there are also
lots of links that I would have expected to be followed but apparently
weren't.

What I would like from swish-e is to give it a starting point (like
http://www.bigelow.org/) and have it index all of the local pages
referenced from there, either directly or indirectly.  

My config file looks like

	IndexDir http://www.bigelow.org/
	IndexFile ./index.swishe
	IndexName "Bigelow Index"
	IndexDescription "This is the index of our site."
	IndexPointer "http://www.bigelow.org/swish/index.html"
	IndexAdmin "Bruce Bowler (bbowler@bigelow.org)"
	MetaNames first author
	IndexReport 3
	FollowSymLinks yes
	IgnoreLimit 50 1000
	IndexComments 0
	MaxDepth 0
	Delay 60
	TmpDir /tmp
	SpiderDirectory /usr/users/bowler/swishe/src
	EquivalentServer http://www.bigelow.org http://alpha1.bigelow.org

I'm using perl 5.00404 and I think I've installed all of the modules that
are documented as being needed.	

Any ideas?

Bruce

Bruce Bowler                             207.633.9600 (voice)
Research Associate                       207.633.9641 (fax)
Bigelow Laboratory for Ocean Sciences    bbowler@bigelow.org
West Boothbay Harbor ME  04575           http://www.bigelow.org/
Received on Tue Jan 26 12:46:21 1999