Skip to main content.
home | support | download

Back to List Archive

RE: Re: Document properties - code sample

From: Rainer Scherg <Rainer.Scherg(at)not-real.rexroth.de>
Date: Wed Sep 08 1999 - 11:27:22 GMT
Mhh, IMO a standard-ASCII index file is not possible - e.g. you have
to store special characters (e.g. german umlauts) from TITLE-tags...

This means you have to take care about converting these special
characters or use a standard characterset like ISO xxxx.

Also our 200 MB index file for our IntraNet server might grow to
much.

Would would make sense is IMO a tool to import or export index files...

-- rainer



-----Original Message-----
From:	Einar Indridason
Sent:	Wednesday, September 08, 1999 1:01 PM
To:	Multiple recipients of list
Subject:	[SWISH-E] Re:  Document properties - code sample

> The index file is packed to be as small as possible, presumably to 
handler
> larger number of documents with smaller files and to get some kind of
> performance advantage. I'm not sure that it has much of an effect of
> performance but I could imagine it since read-ahead caching comes into
> play. That has to balanced with the extra CPU work to decompress 
things.
> Hopefully someone did a benchmark on it way back (when?) when the
> compression part was introduced.
>
> As far as an ASCII index goes, that would be slower because of all of 
the
> numerical values that would have to be converted back from text strings 
to
> numbers on each and every search.  Also, the index uses lots of 
"pointers"
> (file position info) to refer to objects within the file and a pure 
ASCII
> index would be too tempting to just "tweak" by hand, corrupting the 
pointer
> offsets.

*shrug*
Would the speed difference be great enough to affect normal usage of
swish?

Binary index file:
advantages:
	smaller
	the "correct" architecture can handle the date faster
disadvantages:
	more complex
	needs specific tools to deal with it
	a small corruption can corrupt the whole file
	the "wrong" architecture needs to convert the file anyway


Ascii index file:
advantages:
	readable with other tools
	(and fixable if there is the required "know-how")
	portable between all architectures, as all architectures needs
	to convert the file to an internal form anyway.
disadvantages:
	larger
	it might be more tempting to "fix" the index file by hand
	somewhat slower.


Today I would definitely go for an ASCII file.

But that is just my opinion.

Cheers,
--
einari@complex.is

----------------------------------------------------------------------
This Mail has been checked for Viruses
Attention: Encrypted Mails can NOT be checked !

* * *

Diese Mail wurde auf Viren ueberprueft
Hinweis: Verschluesselte Mails koennen NICHT geprueft werden !
----------------------------------------------------------------------
Received on Wed Sep 8 04:22:26 1999