Skip to main content.
home | support | download

Back to List Archive

Re: Swish 2.4.4 breaks on large xls file -- yes it does

From: Gertjan Hofman <gertjan_hofman(at)not-real.yahoo.com>
Date: Wed Oct 25 2006 - 00:21:45 GMT
2.4.4 is still not working right. First, I have to
eliminate the TruncateDocSize argument from the file.
But then swish-e 2.4.4 (with -e) eats up memory till
the entire machine hangs. I added  a 4 Gb swap file
and watched it gobble that up too.

Replacing nothing but the execution command to call
swish-2.4.3 instead and it indexes fine, using at most
about 5% of my 750 MB.

The problem is easily demonstrated even on a directory
containing a single file ( a fairly large file -  20
MB).

I am very happy to run any test anyone can think off. 

Cheers

Gertjan




--- Gertjan Hofman <gertjan_hofman@yahoo.com> wrote:

> 
> I was about to detail how Swish 2.4.4 is grinding to
> a
> halt on a large file when I realized the source of
> problem - still a bug, but I now know why.  Perhaps
> read my draft e-mail first:
>  ------------------------------------
> Hi,
> 
> After the fork/exec thing seemed to do the job, I
> let
> swish run over the weekend. It bailed on all the
> file
> servers I tried to index. I managed to find 1 file
> that demonstrates the problem but I am not sure how
> to
> debug it further.
> 
> 1. The file is parsed by xls2csv but it finishes
> fine
> when run from the command line.
> 
> 2. the 2.4.3 version has no issue with the file
> 
> 3. the file is relatively large (11Mb xls).
> 
> 4. 2.4.4 just hangs - a -T INDEXED_WORDS shows its
> chugging along and just halting. No core, and gdb
> doesnt give me  stack.
> 
> My execution line is:
> /home/ghofman/tmp/swish-e-2.4.4/src/swish-e -e -T
> PARSED_TEXT -T  INDEXED_WORDS -c swish_conf.run -v 3
> >& swish.log
> 
> I am still running it from the build directory as
> you
> can see.
> ---------------------------------------------------
> 
> Ok. I just realized I have a 
> TruncateDocSize 5000000
> statement in my conf file.
> 
> It looks like this is why swish-e is hanging. The
> 2.4.3 does not.  Some how when the limit is exceeded
> the result isnt handled properly. Remove the limit
> and
> 2.4.4 runs through the whole thing.
> 
> WHich reminds me - if you are not using DirTree, is
> there anyway to limit the file size of a file that
> swish-e looks at ?  Seems like a useful option.
> 
> Cheers
> 
> Gertjan'
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Received on Tue Oct 24 17:21:50 2006