Skip to main content.
home | support | download

Back to List Archive

Re: spider and cgi problems

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sat Feb 22 2003 - 13:34:33 GMT
On Sat, 22 Feb 2003, Dave Morton wrote:

> > spider.config:
> >            @servers = (
> >                {
> >                    base_url    => ˙http://localhost/this/%ff,
> >                    email       => ˙me@myself.com˙,
> >                },
> >                {
> >                    base_url    => ˙http://localhost/that/%ff,
> >                    email       => ˙me@myself.com˙,
> >                },
> >            );

I seem to have an encoding problem on this machine -- anyway, those should
be either ascii single or double quotes.

You can also just do this:

  @servers = (
    {
        base_url => [qw (
              http://firstsite.com/
              http://secondsite.com/
              http://thridsite.com/
        )],
        email => ...

That "qw" is "quote word" which is a short-cut to writing:

       base_url => [
              'http://firstsite.com/',
              'http://secondsite.com/',
              'http://thridsite.com/',
       ],

> I wasn't aware that PERL code can go right into the config file. I'll
> have to remember this. That is, if I'm reading the above lines
> correctly.

Yes, it is Perl code.  Means you have to be a little more careful with
syntax (which you can check with perl -c myconfig.pl), but it also means
you can do things like set base_url from a database, uncompress gzipped
files, convert pdf, and many other things right in the config file.

If you want to review the swish.cgi docs online they can be found at:

   http://swish-e.org/dev/docs/swish.html

That includes step-by-step instructions for building an index and
installing and running the search script.  The example is using Apache on
Linux, so I expect it will be different on NT.  It would be very helpful
if you could post the changes required for setting it up on NT.

I have also never used Apache 2 on Windows so it would be helpful to note
any changes with respect to Apache that are needed.

If you read the swish.cgi does you may see:

      This script should work on Windows, but security may be an
      issue.

Besides that general statement, that's because user data is passed through
the shell on Windows, which should be considered insecure.  I spent a year
asking around on Windows support groups for a secure method to run an
external program on Windows but never got an answer.  I suppose the best
answer would be to make the Swish-e library available as a ppm-installable
module and use the library to access the swish-e index instead of the
swish-e.exe program.


-- 
Bill Moseley moseley@hank.org
Received on Sat Feb 22 13:35:08 2003