Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Terms in URL?

From: Kevin Porter <kev(at)>
Date: Sun Jan 20 2008 - 10:59:33 GMT
I've managed to do it with MetaNames swishdocpath.

I couldn't get the ExtractPath approach to work though. I'd appreciate
some help to see where I've gone wrong.

In my config file I now have:

ExtractPath exclusion regex !^.*?(widgetType=BlogArchive).*?$!$1!
ExtractPathDefault exclusion no

I expected to be able to search for "exclusion=no term", to only search
docs without "widgetType=BlogArchive" in the URL.
But instead I get "err: no results".
I've tried variations on this regex too, eg '.?' and '.+' instead of '.*?'.

What have I done wrong?


- Kev

Kevin Porter wrote:
> Thanks Bill, I've read the relevant parts of the docs and am working on 
> it. Looks like ExtractPath (with ExtractPathDefault) is the way to go, 
> because it looks like I can't search for documents where the search 
> phrase is not in the meta swishdocpath?
> Re-indexing is no problem. Unfortunateyl I can't simply remove the docs 
> from my collection because I haven't figured out how to do that yet with 
> the crawler I'm using! (I'll figure out how to do it when I'm more 
> familiar with the source code).
> regards,
> - Kev
> Bill Moseley wrote:
>> On Sat, Jan 19, 2008 at 01:15:03PM +0000, Kevin Porter wrote:
>>> Hi,
>>> I've somehow ended up with a few duplicates in my index, and need to 
>>> remove them, or filter them out of the search results. Before 
>>> implementing it on the web front-end side, I'd like to know if it's 
>>> possible to filter them out with a command line option to swish-e, or to 
>>> remove them totally? The problem URLs contain the string 
>>> "widgetType=BlogArchive". I'm not even sure if swish-e matches terms 
>>> against the URL, or can be made to.
>> If you use
>>     MetaNames swishdocpath
>> then the path will be indexed.  So then you could likely
>> filter on that string.
>> If you want finer control check out ExtractPath.
>> But, both of those would require re-indexing so in that case you might
>> as well not index the files you don't want to include in the index.

Kevin Porter
Advanced Web Construction Ltd

Users mailing list
Received on Sun Jan 20 05:59:40 2008