Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] (null) problem - yes, I've read the FAQ

From: Tomasz Chmielewski <mangoo(at)not-real.wpkg.org>
Date: Thu Nov 29 2007 - 15:57:38 GMT
Bill Moseley schrieb:
> On Wed, Nov 28, 2007 at 11:47:12PM +0100, Tomasz Chmielewski wrote:
>> Peter Karman schrieb:
>>> Tomasz Chmielewski wrote on 11/26/07 2:39 AM:
>>>
>>>> Without the asterisk (*), it works just fine for me.
>>>> What difference does it make?
>>>>
>>> if you have libxml2 installed, then XML* == XML2
>>> if not, then XML* == XML
>>>
>>> So if you have compiled with libxml2, and you specify one directive as XML and
>>> the other as XML*, then you are using 2 different parsers.
>> Hmm, with the asterisk (*), like below, it doesn't work for me:
>>
>> IndexContents HTML* .html
>>
>>
>> # swish-e -c mysite.conf
>>
>> Indexing Data Source: "File-System"
>> Indexing "/srv/www/vhosts/wpkg.org/mailman/archives/public"
>> /srv/www/vhosts/wpkg.org/mailman/archives/public/wpkg-announce/2005-December/000000.html:7: 
>> error: htmlParseEntityRef: expecting ';'
>> s.wpkg.org?Subject=%5BWpkg-announce%5D%20wpkg-0.9.2-test1%20released&In-Reply-To
>>  
>>          ^
> 
> Are you correctly escaping you URLs?  & should be &amp;
> 
> Run your page through a html validator.

It's generated by Mailman, so I can't do much about it. Mailman 
generates "almost" valid HTML though.

The URL in question is:


     <A 
HREF="mailto:blah-announce%40lists.wpkg.org?Subject=%5BWpkg-announce%5D%20blah%20released&In-Reply-To="
        TITLE="subject">blah at wpkg.org
        </A>

swish-e parses all Mailman archive just fine when I use "IndexContents 
HTML .html"; if I add an asterisk (* - "IndexContents HTML* .html"), it 
reports these errors.


-- 
Tomasz Chmielewski
http://wpkg.org
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Nov 29 10:57:43 2007