The old news...
As you have read in previous posts to this list, swish-e 2.x
is consuming a really big amount of memory in the index
proccess. Many of this memory is used for storing the
- file number (index to the file info)
- metaname (it is 1 if no metaname, 2,3 for the rest)
- structure (stores if the word is in head, body, title ...)
- frequency (the number of occurences of the word in the file)
- positions (the positions of the word in the file) This can be a
Each of these values needs 4 bytes.
Now, the new and good news...
Many of that info can be compressed to save memory. So I
decided to make a try and modify the code to handle it. Here are
The test case contains 10000 files and 35000 different words.
Each file contains about 70 words with 7 fields (metaNames) and 5
The test box is a SUN Solaris 2.6 (400 MHZ) with 512MB.
(Note: All the files are in memory cache to minimize the effect of
the filesystem I/O).
swish-e-2.0.1 needed 33 MB of RAM and the index time was 33
"Modified" swish-e 2.x (including new index engine and beta
compression option) needed 20 MB RAM ant the index time was
Both output index files are identical (except for the date/time of the
the header info).
As you see, there is a reduction in memory usage of about 40%.
I do not know if this is enough. Of course, it depends on how many
docs are being indexed and how powerful are your machine
I will release this modifications after completing them (Need to add
them to merge option).
Now, it is time for my vacation.
cu on Sept 17
Received on Thu Aug 31 13:44:38 2000