Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] "-S prog" mashing up words in HTML files

From: Matthew Stanislawski <stnslwsk(at)not-real.uiuc.edu>
Date: Wed Mar 21 2007 - 00:19:42 GMT
Bill Moseley wrote:
> On Tue, Mar 20, 2007 at 04:15:32PM -0500, Matthew Stanislawski wrote:
> Hum, can't see to duplicate it.  Can you try these -- and/or put the
> output from your script someplace?

Tried your indexing methods with your test.html (even subbing in the 
much longer original line), and they index properly.  Hm.

Here's the output of my script, for this particular document:
http://mattstan.net/spew.out

> Might turn up the ParserWarnLevel to see if it's getting confused.

Getting a lot of errors like this:

https://opcenter-test.cso.uiuc.edu/doc/DOORCODES:33: error: Entity 
'nbsp' not defined

Also I also get one "xmlParseEntityRef: expecting ';'" error for a URL 
where a & .cgi argument separator wasn't escaped, but I that's probably 
not relevant.

> How are you specifying "details" as a metaname?

 From my swish.cfg:

MetaNames keywords swishtitle swishdocpath details history netid phone 
problem source owner catsid excerpt
MetaNameAlias details comments updates
MetaNamesRank -2 details

In my perl script, I include the chunk of HTML from our CMS (the content 
of the document inside a <div> block) inside <details> tags.

> 
> Maybe a problem with your version of libxml2?

Given the first error above, that's what it looks like.  I didn't 
install libxml2 myself, however, so I don't exactly know my way around 
it.  How can I troubleshoot libxml2 directly?

Thanks,
-mes
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Mar 20 20:19:42 2007