Skip to main content.
home | support | download

Back to List Archive

Re: Indexing protected area

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Dec 07 2006 - 02:03:16 GMT
On Thu, Dec 07, 2006 at 02:54:29PM +1300, Lesley Walker wrote:
> Bill Moseley wrote:
> > - credentials
> > 
> > You may specify a username and password to be used 
> > automatically when spidering:
> > 
> >     credentials => 'username:password',
> 
> Thanks Bill, this is just what I needed. I can see I've got more reading to
> do and a few changes to make.  I've not done anything with indexing before,
> and the site is currently indexed via
> "swish-e -c <config_file>" which just didn't give me enough clues.

It can get confusing since there's so many ways to do things and since
different programs are doing different parts of the indexing.

The spider and swish are separate programs.  You can either have
swish-e call the spider, or pipe the output of the spider into swish.
It's basically the same thing.  So that means you can work in steps:

    spider.pl default http://yoursite.to.index/ > out.txt

(you can pipe through gzip if you want there)

Then you can look at out.txt and see the data that will be sent to
swish for indexing.  That 'default' tells the spider to use some
pre-defined defaults.  Later you can use a config file to customize
how the spider runs.  See "Running the spider" in the docs.


In your case I might be tempted to download a new copy of swish and
install it in your home directory.

    ./configure --prefix=$HOME/local
    make install

Then add $HOME/local/bin to your PATH.

Once you have an index you can then search it with swish form the
command line.

Then after that is working you can think about interfacing with a
search script.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Wed Dec 6 18:03:17 2006