Mhh, IMO a standard-ASCII index file is not possible - e.g. you have
to store special characters (e.g. german umlauts) from TITLE-tags...
This means you have to take care about converting these special
characters or use a standard characterset like ISO xxxx.
Also our 200 MB index file for our IntraNet server might grow to
much.
Would would make sense is IMO a tool to import or export index files...
-- rainer
-----Original Message-----
From: Einar Indridason
Sent: Wednesday, September 08, 1999 1:01 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Document properties - code sample
> The index file is packed to be as small as possible, presumably to
handler
> larger number of documents with smaller files and to get some kind of
> performance advantage. I'm not sure that it has much of an effect of
> performance but I could imagine it since read-ahead caching comes into
> play. That has to balanced with the extra CPU work to decompress
things.
> Hopefully someone did a benchmark on it way back (when?) when the
> compression part was introduced.
>
> As far as an ASCII index goes, that would be slower because of all of
the
> numerical values that would have to be converted back from text strings
to
> numbers on each and every search. Also, the index uses lots of
"pointers"
> (file position info) to refer to objects within the file and a pure
ASCII
> index would be too tempting to just "tweak" by hand, corrupting the
pointer
> offsets.
*shrug*
Would the speed difference be great enough to affect normal usage of
swish?
Binary index file:
advantages:
smaller
the "correct" architecture can handle the date faster
disadvantages:
more complex
needs specific tools to deal with it
a small corruption can corrupt the whole file
the "wrong" architecture needs to convert the file anyway
Ascii index file:
advantages:
readable with other tools
(and fixable if there is the required "know-how")
portable between all architectures, as all architectures needs
to convert the file to an internal form anyway.
disadvantages:
larger
it might be more tempting to "fix" the index file by hand
somewhat slower.
Today I would definitely go for an ASCII file.
But that is just my opinion.
Cheers,
--
einari@complex.is
----------------------------------------------------------------------
This Mail has been checked for Viruses
Attention: Encrypted Mails can NOT be checked !
* * *
Diese Mail wurde auf Viren ueberprueft
Hinweis: Verschluesselte Mails koennen NICHT geprueft werden !
----------------------------------------------------------------------
Received on Wed Sep 8 04:22:26 1999