Skip to main content.
home | support | download

Back to List Archive

Swish 2.4.4 breaks on large xls file --ah, no it doesnt'

From: Gertjan Hofman <gertjan_hofman(at)not-real.yahoo.com>
Date: Mon Oct 23 2006 - 18:41:38 GMT
I was about to detail how Swish 2.4.4 is grinding to a
halt on a large file when I realized the source of
problem - still a bug, but I now know why.  Perhaps
read my draft e-mail first:
 ------------------------------------
Hi,

After the fork/exec thing seemed to do the job, I let
swish run over the weekend. It bailed on all the file
servers I tried to index. I managed to find 1 file
that demonstrates the problem but I am not sure how to
debug it further.

1. The file is parsed by xls2csv but it finishes fine
when run from the command line.

2. the 2.4.3 version has no issue with the file

3. the file is relatively large (11Mb xls).

4. 2.4.4 just hangs - a -T INDEXED_WORDS shows its
chugging along and just halting. No core, and gdb
doesnt give me  stack.

My execution line is:
/home/ghofman/tmp/swish-e-2.4.4/src/swish-e -e -T
PARSED_TEXT -T  INDEXED_WORDS -c swish_conf.run -v 3
>& swish.log

I am still running it from the build directory as you
can see.
---------------------------------------------------

Ok. I just realized I have a 
TruncateDocSize 5000000
statement in my conf file.

It looks like this is why swish-e is hanging. The
2.4.3 does not.  Some how when the limit is exceeded
the result isnt handled properly. Remove the limit and
2.4.4 runs through the whole thing.

WHich reminds me - if you are not using DirTree, is
there anyway to limit the file size of a file that
swish-e looks at ?  Seems like a useful option.

Cheers

Gertjan'














__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Received on Mon Oct 23 11:41:44 2006