Re: ReplaceRules not working as advertised

From: Colin Kuskie <ckuskie(at)>
Date: Mon Apr 22 2002 - 23:27:11 GMT
From: Bill Moseley <>
Date: Mon, 22 Apr 2002 12:01:57 -0700 (PDT)

>At 11:42 AM 04/22/02 -0700, Colin Kuskie wrote:
>>I found that I was getting "duplicate" results when indexing:
>>1000 "Sunset Presbyterian Men's Ministry
>Page" 29670
>>1000 "Sunset Presbyterian Men's
>Ministry Page" 29670
>Two different URLs.

to the same information, since index.html is a pretty common default.

>Yes perhaps not the best wording.
>You can change the name of of the path stored in the index with
>ReplaceRules, but it doesn't effect what is sent to swish for indexing.
>That's before indexing, not before spidering a URL.
>In other words think of it as a pipe
>   spider | swish
>spider is just passing files to swish, and swish can tell spider

So exactly at what point do the ReplaceRules take place?  If they
were implemented before swish-e invoked the swishspider, then the
system should work as described.

>-S prog with is a lot more flexible.  And probably faster,
>too, since it avoids compiling a perl program for every URL.

I'll look at, and I'll try to use Randal's pslinky program
to do the downloading for me, just to kick it up another notch.

