Skip to main content.
home | support | download

Back to List Archive

Re: Re: Document properties - code sample

From: Einar Indridason <einari(at)>
Date: Wed Sep 08 1999 - 11:06:37 GMT
> The index file is packed to be as small as possible, presumably to handler
> larger number of documents with smaller files and to get some kind of
> performance advantage. I'm not sure that it has much of an effect of
> performance but I could imagine it since read-ahead caching comes into
> play. That has to balanced with the extra CPU work to decompress things.
> Hopefully someone did a benchmark on it way back (when?) when the
> compression part was introduced.
> As far as an ASCII index goes, that would be slower because of all of the
> numerical values that would have to be converted back from text strings to
> numbers on each and every search.  Also, the index uses lots of "pointers"
> (file position info) to refer to objects within the file and a pure ASCII
> index would be too tempting to just "tweak" by hand, corrupting the pointer
> offsets.

Would the speed difference be great enough to affect normal usage of

Binary index file:
	the "correct" architecture can handle the date faster
	more complex
	needs specific tools to deal with it
	a small corruption can corrupt the whole file
	the "wrong" architecture needs to convert the file anyway

Ascii index file:
	readable with other tools
	(and fixable if there is the required "know-how")
	portable between all architectures, as all architectures needs
	to convert the file to an internal form anyway.
	it might be more tempting to "fix" the index file by hand
	somewhat slower.

Today I would definitely go for an ASCII file.

But that is just my opinion.

Received on Wed Sep 8 04:01:10 1999