Skip to main content.
home | support | download

Back to List Archive

Re: swish- indexing deletion

From: Dobrica Pavlinusic <dpavlin(at)not-real.rot13.org>
Date: Fri May 13 2005 - 07:13:49 GMT
On Thu, May 12, 2005 at 11:50:35AM -0700, Bill Moseley wrote:
> On Thu, May 12, 2005 at 02:20:55PM -0400, John Paige wrote:
> > So, if someone is deleting in the same frequency as adding files in
> > the index (for example user's mailbox), the best approach would be to,
> > use incremental -r option to delete, and periodically, re-index and
> > remove the old index file.
> 
> Incremental is good for a mailing list where you never delete.
> Searching an active mail box is another question.  I've been thinking
> about setting up swish for a long time on my mail.  But, I get
> hundreds of emails each day and delete almost that many.  Actually, I
> get thousands -- but most get dropped or rejected early.  So it would
> be hard to keep up with all the updates.  Plus, I often move messages
> around -- from one folder to another.

I have this little toy that I have been playing with (and breaking
swish-e incremental indexing while doing so) called Mail::Box Web Search.

Svnweb is at

http://svn.rot13.org/~dpavlin/svnweb/index.cgi/mws/browse/trunk/

but I can assemble some kind of tar.gz is anybody is interested. It's
basically a thin wrapper between Mail::Box module and code that produces
swish-e index from it and local http server using SWISH perl API.

A swift warning, Mail::Box modules are big and bulky. I should probably
rewrite that using newer Email modules on CPAN, but I just haven't had
time. Currently it doesn't support incremental indexing, but adding that
should be fun weekend project.

> I guess I'd use incremental indexing and when searching make sure the
> mail still exists before presenting the results.  What's a few stat
> calls?

I'm also using mbox format for archive, so I can't just stat to see if
message is deleted. I guess I could convert that to maildir, but I'm
just lazy. I also planned on supporting remote IMAP and POP servers
(with Mail::Box module they basically come free).

> Also, I've thought about installing Mairix since it's just an apt-get
> away.  http://www.rpcurnow.force9.co.uk/mairix/

I'm using mairix with mutt. It's very fast, and I thought about adding
it as indexing engine to MWS (which as abstraction layer to indexing engine,
and some support for Plucene and CLucene but limited) but I haven't written
perl wrapper around it yet.

However, using mairix is somewhat limiting for mws, because I plan to add
file-system search (locate just doesn't work for me any more), so I really
need swish-e :-)

If I let my imagination wild, daemon running in background indexing changed
files on file-system would make it even more useful. With RSS feeder which
creates searchable database of weblogs that I read. Or web pages that I
browsed (reading FireFox history or using tricks with wwwoffle for example).
Oh, why do I need to re-invent Google desktop, dammit it? :-))

-- 
Dobrica Pavlinusic               2share!2flame            dpavlin@rot13.org
Unix addict. Internet consultant.             http://www.rot13.org/~dpavlin
Received on Fri May 13 00:13:50 2005