Skip to main content.
home | support | download

Back to List Archive

Re: Identical Documents

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Sep 16 2004 - 22:54:37 GMT
[I'm sending this back to the list]


On Thu, Sep 16, 2004 at 03:37:54PM -0700, Sebastian Jayaraj wrote:
> Hi Bill,
> 
> Thanks for the quick response. It looks like the spider.pl program does 
> the md5 filtering for only http like URL's.  In my case, I have the 
> documents residing on a windows server, samba mounted on a unix machine 
> on which I run the swish-e program using -S fs option to index them.
> 
> Is there a way to run the spider.pl on the local file system. The other 
> option (roundabout) I was thinking was to expose my source dirs on a 
> webserver and then run the spider to index them.

Well, if you were really lucky you might be able to "spider" locally
with file:///path/to/whatever/index.html -- but I've never tried that.

If you don't mind a tiny bit of Perl programming you could use the
DirTree.pl program and do your own MD5 checking in that program.

Can you just list what files need to be skipped with FileRules?

-- 
Bill Moseley
moseley@hank.org
Received on Thu Sep 16 15:54:50 2004