Skip to main content.
home | support | download

Back to List Archive

Re: Limiting size of index file (was: error indexing pdf files)

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sat Dec 13 2003 - 18:42:39 GMT
On Sat, Dec 13, 2003 at 11:56:24AM -0500, bethsarah wrote:
> So what "variables" control the size of the index file, i would assume word
> count, how many files, whether or not to store description, etc.

Yes, it's those things.

> I have a disk space from my hosting company, while I want to to still
> maintain the structure of my config file, i would like to limit the size of
> the index.  If this not possible, by adjusting the source code then my only
> alternative would be to change my config, which i would rather not do.

I'm having a hard time understanding your question.  The index contains 
the things you wanted in the index, nothing more.  So there's nothing to 
remove without removing features.  

The things that are in the index are words and their positions and some
data about their structure (if the word in <title>, <h1>, <em>, <b> and
so one), and text is stored in the .prop file.  You can reduce the size
of the .prop file by storing fewer properties.  You can reduce the size
of the main index by indexing fewer docs.  Sorry, but there's no simple
#define statements to change to easily, for example, limit the number of
words that are indexed per document or even disable phrase searching and
only store a word and its associated files and not all the structure and
position data.

My guess is after you spent time hacking around in the code to remove 
features you will has wished that you just asked your ISP for more disk 
space.


-- 
Bill Moseley
moseley@hank.org
Received on Sat Dec 13 18:42:47 2003