Skip to main content.
home | support | download

Back to List Archive

Re: Get segmentation fault with this URL using -S http method (More details included)

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Jul 01 2003 - 22:03:10 GMT
On Tue, Jul 01, 2003 at 01:49:23PM -0700, Ken-Yu Lin wrote:
> Hi! Bill, I use the config file below to index just one URL using -S
> http method.
> 
> and get the following result:
> 
> 0> /home/kenyulin/swish-e/swish-e -S http -c test.config
> Indexing Data Source: "HTTP-Crawler"
> Indexing "http://groups.yahoo.com/group/SB-r-us/message/79"
> Segmentation fault (core dumped)

You will be happier in the long run using the -S prog and spider.pl.

The problem seems to be this:

moseley(at)not-real.bumby:~/swish-e/src$ GET http://groups.yahoo.com/robots.txt
User-agent: *
Disallow: 

So swish-e isn't dealing correctly with an empty value.  Let me look....

Yuck, that's some ugly code in there.  Your classic buffer overrun.

You might try this patch:

moseley@bumby:~/swish-e/src$ cvs diff httpserver.c
Index: httpserver.c
===================================================================
RCS file: /cvsroot/swishe/swish-e/src/httpserver.c,v
retrieving revision 1.14
diff -u -r1.14 httpserver.c
--- httpserver.c        28 Mar 2003 16:31:35 -0000      1.14
+++ httpserver.c        1 Jul 2003 21:59:55 -0000
@@ -364,13 +364,18 @@
 
 static char *isolatevalue(char *line, char *keyword, int *plen)
 {
-       /* Find the beginning of the value
-    **/
-    for (line += strlen(keyword); isspace((int)((unsigned char)*line)); line++ ) { /* cast to int 2/22/00 */
+
+    /* Find the beginning of the value  **/
+    for (line += strlen(keyword); *line && isspace((int)((unsigned char)*line)); line++ ) { /* cast to int 2/22/00 */
+    }
+
+    if ( !strlen(line) )
+    {
+        *plen = 0;
+        return line;
     }
 
-    /* Strip off trailing spaces
-    **/
+    /* Strip off trailing spaces  **/
     for (*plen = strlen(line); isspace((int)((unsigned char)*(line + *plen - 1))); (*plen)--) { /* cast to int 2/22/00 */
     }
 



-- 
Bill Moseley
moseley@hank.org
Received on Tue Jul 1 22:03:17 2003