Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Can I index by filename and directory over http

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Tue Feb 12 2008 - 14:55:22 GMT
On 02/12/2008 08:47 AM, David Annis wrote:
> I am a longtime user of htdig and would like to switch to swish-e, but I
> need to be able to index part of sites in several ways.  I need to be able
> to do particular page(s) on one site, a directory on a second and a set of
> pages on a third that all use a common naming convention, but the page that
> links to them does not.
> 
> Here's an example and how I think the swish configuration might work.  I
> want to index:
> 
> http://site1.com/flowers.html,
> anything in http://www.my-site.com/flowers/
> And all of the pages linked from http://www.athirdsite.org/products.html
> that match flowers_*.html
> 
> I think that the first two would be:
> IndexDir http://www.site1.com/flowers.html
> IndexDir http://www.my-site.com/flowers/
> 
> But the third line of the config is harder.  I don't see how to start at one
> page (products.html) that I really don't care to have indexed but follow its
> links or how to use a regex on the results only from the links on that
> particular page.  Is this doable with swish-e?

If you use spider.pl to aggregate your docs, you can define a regex check with a callback:

http://swish-e.org/docs/spider.html#callback_functions

You might even consider creating 3 separate indexes, one for each site, and then merging
them. Might be easier to debug, etc.

-- 
Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Feb 12 09:55:22 2008