Skip to main content.
home | support | download

Back to List Archive

Re: Indexing differs for 2 lines swapped in file

From: <moseley(at)not-real.hank.org>
Date: Wed Oct 29 2003 - 03:24:30 GMT
On Tue, Oct 28, 2003 at 06:04:41PM -0800, Dominique Phommahaxay wrote:

> > and you will see how many errors are generated.
> Somehow I could not redirect the warning to a file using the '>'.

I think they get sent to stderr.

> >  DefaultContents TXT*
> > 
> > That avoids using libxml2.  Let me know if that fixes your problem.
> Yes using DefaultContents TXT* does fixe the problem (J2Ee is now indexed and found).
> 
> How else can I help to contribute to the correction of this issue with libxml2?

I didn't spend much time looking at it.  I just noticed even -T
parsed_words showed that output from libxml2 had stopped.  I then looked
at your source document to see if there was anything odd in that file,
which there wasn't.  Bottom line is I don't have a compiler for Windows
so I can't test to be sure.  But, the same swish-e code is used to feed
the libxml2 parser and the txt2 parser so it seems likely that it's
libxml2.

You would want to isolate that it happens with libxml2. The libxml2
package has a few test programs.  I'm not sure if those are also
included with the Windows port (Dave might be able to answer about
that).  Those were available when building from source -- I don't see
them listed in the debian packages, either.  Something like SAX.c.  

When first working with libxml2 I had a number of odd problems, and I
used those programs to for testing -- it's easier to post problems on
the libxml2 list that can be reproduced using their own test programs.

Still, as I said, you are indexing a text file with an HTML parser, so I 
doubt it would be a high priority fix for the libxml2 folks.


-- 
Bill Moseley
moseley@hank.org
Received on Wed Oct 29 03:36:43 2003