Skip to main content.
home | support | download

Back to List Archive

Re: Indexing protected area

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Dec 07 2006 - 04:44:23 GMT
On Wed, Dec 06, 2006 at 07:16:30PM -0800, Lesley Walker wrote:
> >     spider.pl default http://yoursite.to.index/ > out.txt
> 
> Thanks, I hadn't read far enough to know about that "default" option. I was
> busy setting up a config file based on the minimal example - if I'd seen
> that line in the docs first I would have done that straight away.

It's shown the first line in the first section of the spider docs. ;)

> My mission is to allow searching in some password-protected sub-sites that
> aren't linked from the main page so I think I'll have to do them each
> individually.

Are you going to include a description from the document in search
results?  Kind of defeats the purpose of password protected if you can
get to it from the search index.

> Would it make sense to maintain a separate index for each one rather than
> put it all in together with the main index, even though they're all pretty
> small?

Is that so you can limit searches to specific areas?  I'm not sure it
makes much difference -- if they can be identified by the path then
you can use ExtractPath to create a metaname for searching each or all
sites.  Or, you could use separate indexes.  Probably doesn't matter,
although I'd probably have one index.

> I think I like the idea of leaving the main site index as it is and treating
> the new bits separately.

There's a little extra overhead searching multiple indexes, but for
small number of records it won't make much difference.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Wed Dec 6 20:44:30 2006