Skip to main content.
home | support | download

Back to List Archive

Re: Select files for incremental indexing

From: Dmitri V. Ivanov <dima(at)not-real.intex.spb.ru>
Date: Mon Oct 09 2006 - 17:28:49 GMT
On Mon, Oct 09, 2006 at 08:16:09AM -0500, Peter Karman wrote:
> thanks for this thorough report.
> 
> Sounds like what you're saying is that ctime is not a sufficient measure of 
> whether a file has changed or not. Something more like:
> 
> http://search.cpan.org/~jeremy/File-Signature-1.009/Signature.pm

I've understand that my english is bad. I'm sorry. But IMHO there You are
wrong. To make or check signature we are needed to read entire file.
It's not to much different with indexing it another time.

Yes. I mean that ctime isn't sufficient measure. But to get it we anyway
call stat() and data it provides is sufficient. But we must remember
name and inode number pairs for all directories and monitor it's
changes. Problem is only to sync our remembered data and order of files
and directories we read. I've used sorting for it.

Yes: with perl it's too slow, but not with C. Overhead is sum of O(NlogN) 
where N is number of subdirectories with each directory. It seems to
grow linearily with tree size. It's why I'm especially made slash less
than any other symbol.

With saving all files (not only directories) we can surely find
disappeared files also.

> 
> Undoubtedly this approach would require more overhead in the fs method, and 
> perhaps it would be easier to implement in something like DirTree.pl or 
> similar. Each file's signature would have to be stored/cached somewhere, 
> either in the index or in some external file/db.
> 
> My opinion is that this level of checking is a Good Idea but should not be 
> a feature of Swish-e itself. Swish-e is an indexer/search tool, and really 
> ought to index whatever it is asked to. The -S prog method implements this 
> idea.

Yes and No.

On one point of view indexer can be just utility that must
understand three simple commands (add|update|remove) <filepath>. All
others things are not too necessary.

With other point of view there is a db that can have some sort order.
I don't know how swish db is arranged yet, but there can be a reasons.

> 
> I have some more ideas on this approach which I'll post as I start to make 
> Swish3 development visible on this list.
> 
> Again, that's my opinion and I welcome discussion here on the topic.
> 

I understand that my english is terrible. 1st of all I need to write
good explanation. I don't yet. Sorry. I will try.

WBR
Dmitri Ivanov
Received on Mon Oct 9 10:28:57 2006