Skip to main content.
home | support | download

Back to List Archive

Re: spidering an intranet that requires login

From: <moseley(at)not-real.hank.org>
Date: Fri Aug 22 2003 - 13:29:02 GMT
On Thu, Aug 21, 2003 at 02:41:45PM -0700, Bill Conlon wrote:

> http://myintranet.org/login.php?_function=checkpw&username=swishe&password=spider

> But this bombs out with
> 
> swish-e -S prog -c spider.config
> Indexing Data Source: "External-Program"
> Indexing "spider.pl"
> sh: line 1: username=swishe: command not found
> sh: line 1: .password=swishe: command not found
> /usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'

-S prog programs are opened with a popen() call which uses the shell.  
So you need to escape your metachars.  There's a few ways to do this.

(Under normal shells -- Windows users would need to reverse the quotes, probably)

  IndexDir spider.pl
  SwishProgParameters "default 'http://myintranet.org/login.php?_function=checkpw&username=swishe&password=spider'"

Or just quote the one parameter:

  IndexDir spider.pl
  SwishProgParameters default "'http://myintranet.org/login.php?_function=checkpw&username=swishe&password=spider'"

Or backslash -- need double to escape, but I wouldn't do it this way:

  IndexDir spider.pl
  SwishProgParameters default http://myintranet.org/login.php?_function=checkpw&username=swishe\\&password=spider



-- 
Bill Moseley
moseley@hank.org
Received on Fri Aug 22 13:29:21 2003