On Tue, Mar 20, 2007 at 04:15:32PM -0500, Matthew Stanislawski wrote:
> ...loading dock)<br/></td></tr><tr><td> H5</td><td>DCL Hallway...
>
>
> White-space found word 'dock)H5DCL'
> Adding:[120:swishdefault(1)] 'dock' Pos:289 Stuct:0x1 ( FILE )
> Adding:[120:details(13)] 'dock' Pos:289 Stuct:0x1 ( FILE )
> Adding:[120:swishdefault(1)] 'h5dcl' Pos:290 Stuct:0x1 ( FILE )
> Adding:[120:details(13)] 'h5dcl' Pos:290 Stuct:0x1 ( FILE )
Hum, can't see to duplicate it. Can you try these -- and/or put the
output from your script someplace?
Might turn up the ParserWarnLevel to see if it's getting confused.
How are you specifying "details" as a metaname?
Maybe a problem with your version of libxml2?
moseley@bumby:~$ cat test.html
<html>
<head>
<title>hello</title>
</head>
<body>
<table>
<tr>
<tr><td> H5</td><td>DCL Hallway</td>
</tr>
</table>
</body>
</html>
moseley@bumby:~$ swish-e -v0 -T indexed_words -i test.html
Adding:[1:swishdefault(1)] 'hello' Pos:5 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[1:swishdefault(1)] 'h5' Pos:16 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'dcl' Pos:19 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'hallway' Pos:20 Stuct:0x9 ( BODY FILE )
Maybe it has something to do with -S prog?? Nope:
moseley(at)not-real.bumby:~$ /usr/local/lib/swish-e/spider.pl default file:///home/moseley/test.html 2>/dev/null | swish-e -S prog -i stdin -T indexed_words -v 0
Adding:[1:swishdefault(1)] 'hello' Pos:5 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[1:swishdefault(1)] 'h5' Pos:16 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'dcl' Pos:19 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'hallway' Pos:20 Stuct:0x9 ( BODY FILE )
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Mar 20 17:31:39 2007