Skip to main content.
home | support | download

Back to List Archive

Re: Probably dumb newbie question.

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Aug 26 2004 - 15:16:42 GMT
On Thu, Aug 26, 2004 at 04:07:04AM -0700, Nic Gibson wrote:
> I'm having an odd problem with swish-e 2.4.2. I have an index generated using 
> spider.pl. Contrary to my expectations it appears to be indexing the href content
> of html anchors. I've attached the index configuration file to this message.  The only
> odd thing I can think of about this particular website is that the URLs don't have
> file extensions (see http://pmr.corbas.co.uk/dynamic/). However, the content type
> is definitely correct.

You might set:

   ParserWarnLevel 9

All I saw were some errors about HTML entities that couldn't be mapped
to 8859-1.

Otherwise, can you show the text of the hrefs that is being indexed?
You will likely get better help if you can provide a working example.

I added a "/" to WordCharacters (along with a-z0-9) and used -T
indexed_words and didn't see anything that looked like a URL path.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Aug 26 08:17:15 2004