Thanks Bill, I've read the relevant parts of the docs and am working on
it. Looks like ExtractPath (with ExtractPathDefault) is the way to go,
because it looks like I can't search for documents where the search
phrase is not in the meta swishdocpath?
Re-indexing is no problem. Unfortunateyl I can't simply remove the docs
from my collection because I haven't figured out how to do that yet with
the crawler I'm using! (I'll figure out how to do it when I'm more
familiar with the source code).
Bill Moseley wrote:
> On Sat, Jan 19, 2008 at 01:15:03PM +0000, Kevin Porter wrote:
>> I've somehow ended up with a few duplicates in my index, and need to
>> remove them, or filter them out of the search results. Before
>> implementing it on the web front-end side, I'd like to know if it's
>> possible to filter them out with a command line option to swish-e, or to
>> remove them totally? The problem URLs contain the string
>> "widgetType=BlogArchive". I'm not even sure if swish-e matches terms
>> against the URL, or can be made to.
> If you use
> MetaNames swishdocpath
> then the path will be indexed. So then you could likely
> filter on that string.
> If you want finer control check out ExtractPath.
> But, both of those would require re-indexing so in that case you might
> as well not index the files you don't want to include in the index.
Advanced Web Construction Ltd
Users mailing list
Received on Sat Jan 19 16:00:56 2008