Skip to main content.
home | support | download

Back to List Archive

Re: Issue with indexing

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Jan 13 2005 - 17:39:10 GMT
On Thu, Jan 13, 2005 at 12:31:04PM -0500, John Kelley wrote:
> The problem is that if you go to:
<link removed>
> and search for, say, a*, you get hits.  If you search for p*, you get 
> none.  If you search for specific words in the document, it's hit or 
> miss.  I've set it up so there is only one document in the index, and you 
> can do a "View Source" on it to see it's nothing special.  The indexer 
> counts over 500 words indexed, but it seems only a select few are 
> searchable.

Ok, then here's what you do:

Say you know "foo" is in your file, but you can't search for it.

So you look at "foo" in the doc and notice where it is and what words
are around it.  Then you do:

   swish-e -c config -i test.doc -T indexed_words | less

and search for foo.  If it's not there then you look for words around
it so you can figure out why it's not being indexed.  Maybe "foo" is
getting split up into "f" and "oo" or maybe you have some bad markup,
or maybe there's a character that couldn't be converted to 8859-1
encoding and that caused foo to get lost.

Then you find the smallest bit of html that shows the problem and
create a new test document and post it to the list.


-- 
Bill Moseley
moseley@hank.org
Received on Thu Jan 13 09:39:10 2005