Skip to main content.
home | support | download

Back to List Archive

swish-e core dump while indexing XML documents

From: Bill Schell <friedfish(at)not-real.optonline.net>
Date: Mon Mar 22 2004 - 18:25:31 GMT
Hi Everyone.

I recently used swish-e to index a set of about 65K XML documents, each of
which is only a page or two in length.   I got a segmentation fault for
reasons unknown.   The machine in question is an x86 running Slackware Linux
(2.4.23) with 4GB of memory. The swish-e process itself isn't running out of
memory and is only around 55MB in size when it dies.

I've included my swish-e config file below, followed by a gdb stack trace.
Any advice is greatly appeciated.

Thanks,
	Bill Schell
--------------------------------------------------------------------------------------------
config file:

DefaultContents XML
MetaNames HeadLine distributor DateAndTime
ParserWarnLevel 3
PropertyNameAlias swishtitle HeadLine
PropertyNames DateAndTime
StoreDescription XML <body.content> 200

----------------------------------------------------------------------------------------------
gdb /usr/local/bin/swish-e
GNU gdb 5.3
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-slackware-linux"...
(gdb) run -f /tmp/docs.index -c ~/swish.docs.conf -i ~/docs/030106 
~/docs/030107
Starting program: /usr/local/bin/swish-e -f /tmp/docs.index -c 
~/swish.docs.conf -i ~/docs/030106 ~/docs/030107
[New Thread 16384 (LWP 6498)]
Indexing Data Source: "File-System"
Indexing "/home/bill/docs/030106"

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16384 (LWP 6498)]
0x4024ef2d in _int_malloc () from /lib/libc.so.6
(gdb) where
#0  0x4024ef2d in _int_malloc () from /lib/libc.so.6
#1  0x4024e78c in calloc () from /lib/libc.so.6
#2  0x401ade76 in zcalloc () from /usr/lib/libz.so.1
#3  0x401aa5b7 in deflateInit2_ () from /usr/lib/libz.so.1
#4  0x401aa40a in deflateInit_ () from /usr/lib/libz.so.1
#5  0x401a8b0d in compress2 () from /usr/lib/libz.so.1
#6  0x080614bf in compress_property (prop=0x82d2a98, sw=0x80c3f78, 
buf_len=0xbffff364,
    uncompressed_len=0xbffff368) at docprop_write.c:163
#7  0x0806139d in WritePropertiesToDisk (sw=0x80c3f78, fi=0xbffff3b0)
    at docprop_write.c:100
#8  0x08055c2c in do_index_file (sw=0x80c3f78, fprop=0x82228a8) at index.c:994
#9  0x080505ce in printfile (sw=0x80c3f78, filename=0x82228a8 
"�r\bh(\"\b�\r\b")
    at fs.c:601
#10 0x08050683 in printfiles (sw=0x80c3f78, e=0x80d7ca0) at fs.c:642
#11 0x08050276 in indexadir (sw=0x80c3f78, dir=0x80d7d48 
"/home/bill/docs/030106")
    at fs.c:445
#12 0x0805af86 in indexpath (sw=0x80c3f78, path=0x80d7d48 
"/home/bill/docs/030106")
    at file.c:217
#13 0x0804d098 in cmd_index (sw=0x80c3f78, params=0x80d1f70) at swish.c:1351
#14 0x0804bbb8 in main (argc=8, argv=0xbffff584) at swish.c:209
#15 0x401ecd06 in __libc_start_main () from /lib/libc.so.6
(gdb) frame 6
#6  0x080614bf in compress_property (prop=0x82d2a98, sw=0x80c3f78, 
buf_len=0xbffff364,
    uncompressed_len=0xbffff368) at docprop_write.c:163
163         zlib_status = compress2( (Bytef *)PropBuf, &dest_size, 
prop->propValue, prop->propLen, sw->PropCompressionLevel);
(gdb) p *prop
$1 = {propLen = 200, propValue = "Q"}
(gdb) p *sw
$2 = {ResultOutput = 0x0, Filter = 0x80c5708, ResultSort = 0x80c4af8,
  Entities = 0x80c5718, Db = 0x80c4a40, Index = 0x4030e008, FS = 0x80c8290,
  HTTP = 0x80c82e0, SwishWords = 0x80c5308, Prog = 0x80d1f60, indexlist = 
0x80d1fb8,
  Prop_IO_Buf = 0x82d7850 
"x\234-\216\n�\020\005\177�225\021B@,D\024�005�020��\026.\227\211o\024\a��232?ϸ-\232\027J\006\233`\003�222<\236\225=�231\235pF�fS�t<\224\f\0329\215\027\e�225xs3p\225<\200\222�\217\022\030-�\216Q�\2202<g�023\231\036N�Q�/12\204I\021de�\226�215�(gSq�\207Ye\0176\vDI'\tc\214�\217u.\037", 
PropIO_allocated = 65356, PropCompressionLevel = -1, TotalWords = 0,
  TotalFiles = 0, verbose = 1, headerOutVerbose = 1, lasterror = 0,
  lasterrorstr = '\0' <repeats 500 times>, isvowellookuptable = {0 <repeats 65 
times>,
    1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
    0 <repeats 11 times>, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0,
    1, 0 <repeats 138 times>}, dirlist = 0x80d7d40, suffixlist = 0x0, 
mtime_limit = 0,
  truncateDocSize = 0, replaceRegexps = 0x0, pathExtractList = 0x0,
  nocontentslist = 0x0, DefaultDocType = 3, indexcontents = 0x0, indexComments 
= 0,
  storedescription = 0x80dde40, ignoremetalist = 0x0, dontbumpstarttagslist = 
0x0,
  dontbumpendtagslist = 0x0, UndefinedMetaTags = UNDEF_META_DISABLE,
  UndefinedXMLAttributes = UNDEF_META_DISABLE, parser_warn_level = 3,
  obeyRobotsNoIndex = 0, links_meta = 0x0, images_meta = 0x0, IndexAltTag = 0,
  IndexAltTagMeta = 0x0, AbsoluteLinks = 0, XMLClassAttributes = 0x0,
  header_names = 0x0, index_names = 0x0, temp_string_buffer = 0x0,
  temp_string_buffer_len = 0, stemmed_word = 0x0, stemmed_word_len = 0,
  structure_map_set = 0, structure_map = {0 <repeats 256 times>}, 
ref_count_ptr = 0x0}
(gdb) p *buf_len
$3 = 15
(gdb) p *uncompressed_len
$4 = 0
(gdb)
Received on Mon Mar 22 10:25:32 2004