Skip to main content.
home | support | download

Back to List Archive

Re: Identical Documents

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Sep 16 2004 - 21:50:33 GMT
On Thu, Sep 16, 2004 at 02:47:11PM -0700, Sebastian Jayaraj wrote:
> Hi,
> 
> I recently installed Swish-e and it's doing a really good job of 
> searching PDF's. I would like to know if there are any configuration 
> switches to eliminate duplicate results.

If you are using spider.pl then look at the MD5 option.

If you know what files are duplicates then you could also just exclude
them in a test_url() check.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Sep 16 14:50:45 2004