Skip to main content.
home | support | download

Back to List Archive

Re: URL case during multiple index search

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Sep 30 2002 - 01:25:12 GMT
At 05:55 PM 09/29/02 -0700, Trond Nilsen wrote:
>Am I right in assuming that when Swish-E performs a search on multiple
indexes 
>that when the results are merged, they are done so with case sensitivity?

Results are not merged when searching multiple indexes.

~/swish-e/src $ ./swish-e -i index.c -f 1 -v0
~/swish-e/src $ ./swish-e -i index.c -f 2 -v0 
~/swish-e/src $ ./swish-e -w not dkdkd -f 1 2 -H0
1000 index.c "index.c" 81446
1000 index.c "index.c" 81446

Are you talking about -M type of merge where indexes are merged before
searching the combined single index?

>So, is there any way to get Swish to ignore case when merging? I'm having 
>trouble spidering a large site over which I have no editorial control, where 
>the writers have been lazy and specified pages with both cases. I can solve 
>the problem with some post-processing, but I figured I'd check first :)

If you are talking about -M merge then check out:

http://swish-e.org/current/docs/SWISH-CONFIG.html#item_PropertyNamesCompareC
ase

I think you can set the swishdocpath as case-insensitve.

The other thing is to lowercase the URL when spidering by editing the
spider program ( swishspider or spider.pl ).

The right solution is to convince the site owner to fix their broken URLs.
All it would take is a short perl script...



-- 
Bill Moseley
mailto:moseley@hank.org
Received on Mon Sep 30 01:28:56 2002