I've managed to do it with MetaNames swishdocpath.
I couldn't get the ExtractPath approach to work though. I'd appreciate
some help to see where I've gone wrong.
In my config file I now have:
ExtractPath exclusion regex !^.*?(widgetType=BlogArchive).*?$!$1!
ExtractPathDefault exclusion no
I expected to be able to search for "exclusion=no term", to only search
docs without "widgetType=BlogArchive" in the URL.
But instead I get "err: no results".
I've tried variations on this regex too, eg '.?' and '.+' instead of '.*?'.
What have I done wrong?
Kevin Porter wrote:
> Thanks Bill, I've read the relevant parts of the docs and am working on
> it. Looks like ExtractPath (with ExtractPathDefault) is the way to go,
> because it looks like I can't search for documents where the search
> phrase is not in the meta swishdocpath?
> Re-indexing is no problem. Unfortunateyl I can't simply remove the docs
> from my collection because I haven't figured out how to do that yet with
> the crawler I'm using! (I'll figure out how to do it when I'm more
> familiar with the source code).
> - Kev
> Bill Moseley wrote:
>> On Sat, Jan 19, 2008 at 01:15:03PM +0000, Kevin Porter wrote:
>>> I've somehow ended up with a few duplicates in my index, and need to
>>> remove them, or filter them out of the search results. Before
>>> implementing it on the web front-end side, I'd like to know if it's
>>> possible to filter them out with a command line option to swish-e, or to
>>> remove them totally? The problem URLs contain the string
>>> "widgetType=BlogArchive". I'm not even sure if swish-e matches terms
>>> against the URL, or can be made to.
>> If you use
>> MetaNames swishdocpath
>> then the path will be indexed. So then you could likely
>> filter on that string.
>> If you want finer control check out ExtractPath.
>> But, both of those would require re-indexing so in that case you might
>> as well not index the files you don't want to include in the index.
Advanced Web Construction Ltd
Users mailing list
Received on Sun Jan 20 05:59:40 2008