Skip to main content.
home | support | download

Back to List Archive

Re: Issue with indexing

From: John Kelley <john(at)not-real.atypica.com>
Date: Thu Jan 13 2005 - 17:47:31 GMT
OK, I did this and found 'kiddush' IS being indexed, but not coming up when 
I search for it.

Here are the lines around the word getting idexed:
     Adding:[1:summary(13)]   'ritual'   Pos:601  Stuct:0x89 ( META BODY FILE )
     Adding:[1:summary(13)]   'kiddush'   Pos:602  Stuct:0xC9 ( EM META 
BODY FILE )
     Adding:[1:summary(13)]   'sanctification'   Pos:603  Stuct:0xC9 ( EM 
META BODY FILE )

So the problem appears to be in the search end.

John

At 12:37 PM 1/13/2005, you wrote:
>On Thu, Jan 13, 2005 at 12:31:04PM -0500, John Kelley wrote:
> > The problem is that if you go to:
><link removed>
> > and search for, say, a*, you get hits.  If you search for p*, you get
> > none.  If you search for specific words in the document, it's hit or
> > miss.  I've set it up so there is only one document in the index, and you
> > can do a "View Source" on it to see it's nothing special.  The indexer
> > counts over 500 words indexed, but it seems only a select few are
> > searchable.
>
>Ok, then here's what you do:
>
>Say you know "foo" is in your file, but you can't search for it.
>
>So you look at "foo" in the doc and notice where it is and what words
>are around it.  Then you do:
>
>    swish-e -c config -i test.doc -T indexed_words | less
>
>and search for foo.  If it's not there then you look for words around
>it so you can figure out why it's not being indexed.  Maybe "foo" is
>getting split up into "f" and "oo" or maybe you have some bad markup,
>or maybe there's a character that couldn't be converted to 8859-1
>encoding and that caused foo to get lost.
>
>Then you find the smallest bit of html that shows the problem and
>create a new test document and post it to the list.
>
>
>--
>Bill Moseley
>moseley@hank.org
Received on Thu Jan 13 09:47:31 2005