Skip to main content.
home | support | download

Back to List Archive

Re: Re: Document properties - code sample

From: Einar Indridason <einari(at)not-real.complex.is>
Date: Wed Sep 08 1999 - 11:06:37 GMT
> The index file is packed to be as small as possible, presumably to handler
> larger number of documents with smaller files and to get some kind of
> performance advantage. I'm not sure that it has much of an effect of
> performance but I could imagine it since read-ahead caching comes into
> play. That has to balanced with the extra CPU work to decompress things.
> Hopefully someone did a benchmark on it way back (when?) when the
> compression part was introduced.
> 
> As far as an ASCII index goes, that would be slower because of all of the
> numerical values that would have to be converted back from text strings to
> numbers on each and every search.  Also, the index uses lots of "pointers"
> (file position info) to refer to objects within the file and a pure ASCII
> index would be too tempting to just "tweak" by hand, corrupting the pointer
> offsets.

*shrug*
Would the speed difference be great enough to affect normal usage of
swish?

Binary index file:
advantages:
	smaller
	the "correct" architecture can handle the date faster
disadvantages:
	more complex
	needs specific tools to deal with it
	a small corruption can corrupt the whole file
	the "wrong" architecture needs to convert the file anyway


Ascii index file:
advantages:
	readable with other tools
	(and fixable if there is the required "know-how")
	portable between all architectures, as all architectures needs
	to convert the file to an internal form anyway.
disadvantages:
	larger
	it might be more tempting to "fix" the index file by hand
	somewhat slower.


Today I would definitely go for an ASCII file.

But that is just my opinion.

Cheers,
--
einari@complex.is
Received on Wed Sep 8 04:01:10 1999