On 02/12/2008 08:47 AM, David Annis wrote:
> I am a longtime user of htdig and would like to switch to swish-e, but I
> need to be able to index part of sites in several ways. I need to be able
> to do particular page(s) on one site, a directory on a second and a set of
> pages on a third that all use a common naming convention, but the page that
> links to them does not.
> Here's an example and how I think the swish configuration might work. I
> want to index:
> anything in http://www.my-site.com/flowers/
> And all of the pages linked from http://www.athirdsite.org/products.html
> that match flowers_*.html
> I think that the first two would be:
> IndexDir http://www.site1.com/flowers.html
> IndexDir http://www.my-site.com/flowers/
> But the third line of the config is harder. I don't see how to start at one
> page (products.html) that I really don't care to have indexed but follow its
> links or how to use a regex on the results only from the links on that
> particular page. Is this doable with swish-e?
If you use spider.pl to aggregate your docs, you can define a regex check with a callback:
You might even consider creating 3 separate indexes, one for each site, and then merging
them. Might be easier to debug, etc.
Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
Users mailing list
Received on Tue Feb 12 09:55:22 2008