> The index file is packed to be as small as possible, presumably to handler
> larger number of documents with smaller files and to get some kind of
> performance advantage. I'm not sure that it has much of an effect of
> performance but I could imagine it since read-ahead caching comes into
> play. That has to balanced with the extra CPU work to decompress things.
> Hopefully someone did a benchmark on it way back (when?) when the
> compression part was introduced.
> As far as an ASCII index goes, that would be slower because of all of the
> numerical values that would have to be converted back from text strings to
> numbers on each and every search. Also, the index uses lots of "pointers"
> (file position info) to refer to objects within the file and a pure ASCII
> index would be too tempting to just "tweak" by hand, corrupting the pointer
Would the speed difference be great enough to affect normal usage of
Binary index file:
the "correct" architecture can handle the date faster
needs specific tools to deal with it
a small corruption can corrupt the whole file
the "wrong" architecture needs to convert the file anyway
Ascii index file:
readable with other tools
(and fixable if there is the required "know-how")
portable between all architectures, as all architectures needs
to convert the file to an internal form anyway.
it might be more tempting to "fix" the index file by hand
Today I would definitely go for an ASCII file.
But that is just my opinion.
Received on Wed Sep 8 04:01:10 1999