On Sat, 2003-02-22 at 02:27, Dave-CBC wrote:
> but can't get it to index more than just links found
> on the pages it spiders
A web spider can only index URLs it knows about... It can't guess what
other URLs might exist. Well, I suppose it could but it wouldn't get
far.
Do you have filesystem access to the web server? If so then use the
filesystem method rather than a spider. If no filesystem access then
you'll have to give the spider a list of URLs.
spider.config:
@servers = (
{
base_url => ˙http://localhost/this/%ff,
email => ˙me@myself.com˙,
},
{
base_url => ˙http://localhost/that/%ff,
email => ˙me@myself.com˙,
},
);
swish.config:
IndexDir ./spider.pl
SwishProgParameters spider.config
DefaultContents HTML2
StoreDescription HTML2 <BODY> 100000
Create SWISH-E index like this:
swish-e -c swish.config -S prog -E ./swishError.log
There is an example spider config SwishSpiderConfig.pl in the prog-bin
directory. It has perldoc documentation and many comments.
> my $file = "$swish_binary -w $query -d :: -v 3 -H 9 -f D:/PROGRA~1/SWISH-E/index.swish";
> if ( $pid = open( SWISH, "$file|" ) ) {
> if ( $pid = open( SWISH, '-|' ) ) {
What version of SWISH-E? SWISH-E's current search.cgi doesn't use
open() on Windows. I don't recall when Bill fixed that but I think it's
been quite a while.
Latest SWISH-E builds are here:
http://www.swish-e.org/Download/
and here:
http://www.webaugur.com/wares/files/swish-e/
Odd minor numbers are development builds. 2.2.x is a release, 2.3.x is
development. Current release is 2.2.3 (2002-12-11).
--
David Norris
http://www.webaugur.com/dave/
ICQ - 412039
Received on Sat Feb 22 08:50:01 2003