Skip to main content.
home | support | download

Back to List Archive

Re: Failed to uncompress Property. zlib uncompress

From: Weir James K Contr ASC/ENOI <James.Weir(at)not-real.wpafb.af.mil>
Date: Tue May 04 2004 - 11:48:18 GMT
> -----Original Message-----
> From: swish-e@sunsite.berkeley.edu 
> [mailto:swish-e@sunsite.berkeley.edu] On Behalf Of Bill Moseley
> Sent: Monday, May 03, 2004 2:21 PM
> To: Multiple recipients of list
> Subject: [SWISH-E] Re: Failed to uncompress Property. zlib 
> uncompress retu
> 
> 
> On Mon, May 03, 2004 at 11:02:57AM -0700, Weir James K Contr 
> ASC/ENOI wrote:
> > > Pushing the limits with 3 million files, I suppose.  How long
> > > does it take to index?
> > It about  3 days to the index
> 
> Are you running out of RAM?  I assume you are using -e when indexing.
I do not believe I am running out of RAM, I have 3 GB of RAM and 59GB of drive space.
This is the information at the end of the log file
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 12611125 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
12611125 unique words indexed.
5 properties sorted.
3152637 files indexed.  853492536 total bytes.  628026074 total words.
Elapsed time: 42:08:36 CPU time: 42:08:35
Indexing done!


> 
> Swish-e uses a bunch of hash-based stores, and they are not scalable.
> Indexing large file sets without using -e you can really see indexing
> slow down over time.  
> 
> I once wrote a small program to generate random documents 
> using words from a
> dictionary.  The program generated progress reports on the number of
> words per minute indexed.  Without -e indexing really slowed down at
> about a million documents.   Using -e started out slower but 
> didn't slow
> down as much.  IIRC, it took about an hour or so to index a million
> of those simple "documents".
> 
> Anyway, three days is not acceptable amount of time for indexing.  If
> it's not something obvious (like running out of RAM) then you 
> might want
> to look into other indexing systems that are designed for larger
> indexing jobs.
This is not a problem for us. We are only indexing these text files once. This is old achieve data that we need to search.
I have tried other windows based indexing systems and Swish-e seem to work the best of all. I was wondering if I break the 
Indexes up into smaller deptsyms (remember the email about two deptsyms coming up when you select one [example you are searching on "ET" and "ET", "ETG" comes back]). From the ASP I believe I can select deptsym and concatenate that to the index file name and search that way (example ET = ET_fuzzy006.index) Then if you want to search across different deptsym just string the index files together. I not sure if this would fix most of my problems or not.

 

Jim 
> 
> 
> 
> 
> 
> -- 
> Bill Moseley
> moseley@hank.org
> 
Received on Tue May 4 04:48:22 2004