It seems to apply to invalid entities too e.g.:
&asdf;
And it doesn't just ignore the text in the current MetaName. It ignores
the rest of the XML. So if an & appears in the title, other metanames
also get ignored after the title is </title>'d.
This is obviously my fault - I should be escaping stuff. But it's a
subtle issue that can catch idiots like me. So perhaps a mention in the
docs?
On Sat, 2005-02-12 at 15:11 -0800, Mark Maunder wrote:
> Ok I got it. And it's a little monster. I am a moron and forgot to
> escape my &'s in the XML I was passing to swish. And swish was being a
> little so-and-so and silently ignoring all text after any unescaped
> ampersand in the XML. So with the following:
>
> <title>CEO & CFO - assistant</title>
>
> Only the word CEO gets indexed.
>
> Aaaaah. Life is good again.
>
>
>
>
>
> On Sat, 2005-02-12 at 00:25 -0800, Bill Moseley wrote:
> > On Fri, Feb 11, 2005 at 11:37:12PM -0800, Mark Maunder wrote:
> > > Thanks for all the help so far.
> > >
> > > Peter I added this to swish.conf:
> > > MetaNameAlias job swishdefault
> > >
> > > And got the following error:
> > > err: MetaNameAlias - name 'swishdefault' is already a MetaName or
> > > MetaNameAlias
> >
> > Try the docs again. Let me know if it's not clear.
> >
> > > I'm going to log all the XML I index to a text file (should come out at
> > > 9Gigs by my estimates) and work with that.
> >
> > Start small first. You already know of a document (or more) that you
> > think should not be in results but is. Just use that one document for
> > testing first.
> >
>
Received on Sat Feb 12 15:19:26 2005