Thanks Bill, I've read the relevant parts of the docs and am working on
it. Looks like ExtractPath (with ExtractPathDefault) is the way to go,
because it looks like I can't search for documents where the search
phrase is not in the meta swishdocpath?
Re-indexing is no problem. Unfortunateyl I can't simply remove the docs
from my collection because I haven't figured out how to do that yet with
the crawler I'm using! (I'll figure out how to do it when I'm more
familiar with the source code).
regards,
- Kev
Bill Moseley wrote:
> On Sat, Jan 19, 2008 at 01:15:03PM +0000, Kevin Porter wrote:
>
>> Hi,
>>
>> I've somehow ended up with a few duplicates in my index, and need to
>> remove them, or filter them out of the search results. Before
>> implementing it on the web front-end side, I'd like to know if it's
>> possible to filter them out with a command line option to swish-e, or to
>> remove them totally? The problem URLs contain the string
>> "widgetType=BlogArchive". I'm not even sure if swish-e matches terms
>> against the URL, or can be made to.
>>
>
> If you use
>
> MetaNames swishdocpath
>
> then the path will be indexed. So then you could likely
> filter on that string.
>
> If you want finer control check out ExtractPath.
>
> But, both of those would require re-indexing so in that case you might
> as well not index the files you don't want to include in the index.
>
>
--
Kevin Porter
Advanced Web Construction Ltd
http://www.9ballpool.co.uk
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Sat Jan 19 16:00:56 2008