At 10:41 AM 11/15/00 -0800, email@example.com wrote:
>Just another point of view. If the summary is stored with
>the filepath, all the file related data is contiguous in the
>index file, making retrievals faster (less I/O may be expected).
>If we use properties, at least we need one extra I/O operation
>because the data is not contiguous.
Oh, I see. I need to review the index file format again -- if I can figure
>BTW, this makes me thinking why swish-e is using just one unique
>index file. The only reason that comes to my mind is simplicity, but...
>- The total index file is limited to 2GB (well, I know that probably our
>sites are not like google).
>- Updating, inserting and deleting is really hard to do. It should be
>easier with several files. Eg: one for the header and words, another
>one for words'data, another one for file's data and another one for
>What do you think?
I can't see it being any problem. Frankly, I like a single file, but for
no good reason. I was wondering about this some time back, not so much
about swish, but about Perl's DBM usage and how BerkeleyDB seemed to use
one file and ndbm (sdbm?) used .pag and .dir files.
Received on Wed Nov 15 19:01:55 2000