Skip to main content.
home | support | download

Back to List Archive

Re: Get segmentation fault with this URL using -S http method

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Jul 01 2003 - 20:01:38 GMT
On Tue, Jul 01, 2003 at 11:37:12AM -0700, Ken-Yu Lin wrote:
> Using swish-e-2.2.3 (make with libxml2.so.2.5.7) on a Sun machine.
> 
> Whenever I try to index this URL
> (http://groups.yahoo.com/group/SB-r-us/message/79), I get
> segmentation fault (core dumped).

Sorry, I can't duplicate.

> But with other websites, swish-e works just fine.
> 
> BTW, I didn't use any special setting.

What un-special things did you use?  Can you provide enough details to 
reproduce your problem, or do I have to guess? ;)


moseley@bumby:~$ cat spider.config
@servers = (

    {
        base_url    => 
'http://groups.yahoo.com/group/SB-r-us/message/79',
        agent       => 'swish-e spider http://swish-e.org/',
        email       => 'spider@hank.org',
        max_indexed => 1,
    },
);    
1;


moseley@bumby:~$ /usr/local/lib/swish-e/spider.pl spider.config > test.html
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'spider.config'
/usr/local/lib/swish-e/spider.pl: Max indexed files Reached

Summary for: http://groups.yahoo.com/group/SB-r-us/message/79
    Duplicates:     2  (0.2/sec)
Off-site links:     7  (0.6/sec)
   Total Bytes: 3,495  (291.2/sec)
    Total Docs:     1  (0.1/sec)
   Unique URLs:     3  (0.2/sec)
moseley@bumby:~$ head test.html
Path-Name: http://groups.yahoo.com/group/SB-r-us/auth?check=G&done=%2Fgroup%2FSB-r-us%2Fmessage%2F79
Content-Length: 3495
Document-Type: html*


<HTML>

<HEAD>

        
moseley@bumby:~$ cat test.html | swish-e -S prog -i stdin
Indexing Data Source: "External-Program"
Indexing "stdin"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 58 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
58 unique words indexed.
4 properties sorted.                                              
1 file indexed.  3495 total bytes.  78 total words.



-- 
Bill Moseley
moseley@hank.org
Received on Tue Jul 1 20:01:44 2003