Re: lucene/plucene/clucene

From: David L Norris <dave(at)>
Date: Mon May 09 2005 - 19:29:56 GMT
On Mon, 2005-05-09 at 12:05 -0700, Keith Ivey wrote:
> Are you talking about large index files, or indexing large files?  The 
> references to "source" and "offset" make it seem like these options address the 
> latter, but maybe I'm misunderstanding something.

All files on UNIX systems can be larger than 2 GB.  Large File support
on UNIX systems replaces file offsets and all the functions which use
them with 64-bit versions.  For most programs it's just a matter of
defining those two variables and recompiling.

The Windows POSIX subsystem doesn't implement quite the same API as
modern UNIX systems for 64-bit file offsets.  So I wrote a wrapper that
implements the UNIX LargeFile API using the functions Windows provides.
There's still an issue with fseek, if I recall.

> I don't care about indexing individual files larger than a couple of megabytes, 
> but I have run into a 2 GB limit on the size of the index file.  Still, I might 
> be better off keeping my indexes split up into less-than-2GB parts anyway, since 
> I fear that larger index files lead to larger memory use during searching.

Index size shouldn't affect memory usage of Swish-e much or at all as
far as I know.  Someone would probably have to test that to know for
sure, though.

 David Norris
  ICQ - 412039
