Re: [swish-e] Using a storeDescription tag for HTML other than <body>

From: Peter Karman <peter(at)>
Date: Fri Nov 09 2007 - 04:44:02 GMT
Robinson Craig wrote on 11/7/07 9:33 PM:
> Hi Swish-e experts,
> Firstly, a caveat: I am a swish-e newbie.
> Secondly, the question:
> For HTML, is it possible to use a storeDescription tag other than <body>
> per:
>      StoreDescription HTML* <body> 256
> I have just indexed my site (file not http) using:
>      StoreDescription HTML* '<div id="bodytext">' 256
> Where <div id="bodytext"> is the beginning of the main body of content.
> It returns ... (aka nothing). Interestingly, if you use:
>      StoreDescription HTML* <div id="bodytext"> 256
> without single-quotes, swish-e errors when starting.

My brief tests show that you can do this (grab a property based on an attribute
name) only if you index as XML not HTML.

Also, I see what appears to be a bug with StoreDescription in the XML case.

[karpet@pekmac:~/tmp]$ cat c
XMLClassAttributes id
MetaNames div.bodytext
StoreDescription XML2 <div.bodytext> 256
#PropertyNamesMaxLength 256 swishdescription
#PropertyNameAlias swishdescription div.bodytext
DefaultContents XML2

[karpet@pekmac:~/tmp]$ cat test.html
foo bar
<div id="bodytext"> blah </div>
more here

If you index that test doc with that config, you will see the words 'blah more
here' stored under the swishdescription property, even though only 'blah' should be.

However, if you use the morally equivalent PropertyName* config opts instead of
StoreDescription, you only get 'blah'. (They are commented out in the config
above; comment them in and comment out StoreDescription to see what I mean.)

