Skip to main content.
home | support | download

Back to List Archive

Re: ReplaceRules not working as advertised

From: Colin Kuskie <ckuskie(at)>
Date: Mon Apr 22 2002 - 23:27:11 GMT
---------- Original Message ----------------------------------
From: Bill Moseley <>
Date: Mon, 22 Apr 2002 12:01:57 -0700 (PDT)

>At 11:42 AM 04/22/02 -0700, Colin Kuskie wrote:
>>I found that I was getting "duplicate" results when indexing:
>>1000 "Sunset Presbyterian Men's Ministry
>Page" 29670
>>1000 "Sunset Presbyterian Men's
>Ministry Page" 29670
>Two different URLs.

to the same information, since index.html is a pretty common default.

>Yes perhaps not the best wording.
>You can change the name of of the path stored in the index with
>ReplaceRules, but it doesn't effect what is sent to swish for indexing.
>That's before indexing, not before spidering a URL.
>In other words think of it as a pipe
>   spider | swish
>spider is just passing files to swish, and swish can tell spider

So exactly at what point do the ReplaceRules take place?  If they
were implemented before swish-e invoked the swishspider, then the
system should work as described.

>-S prog with is a lot more flexible.  And probably faster,
>too, since it avoids compiling a perl program for every URL.

I'll look at, and I'll try to use Randal's pslinky program
to do the downloading for me, just to kick it up another notch.

Received on Mon Apr 22 23:27:16 2002