Skip to main content.
home | support | download

Back to List Archive

Re: Searching only a specific div class

From: Thomas Sewell <tsewell(at)>
Date: Fri Mar 12 2004 - 19:39:43 GMT
I added the following to my config:

PropertyNamesIgnoreCase swishtitle swishdocpath swishdescription div.product-authors

and now it says "6 properties sorted." when indexing and I get "No Results" when searching instead of the original error, so progress has been made!

I also tried adding:

DefaultContents XML*

But that simply made it so that searching on body and other options also returns no results. (While with HTML, everything but the div.product-authors worked normally.)

Perhaps I just need to figure out the correct way to convince swish-e to select the right data from the page to fill in the div.product-authors property?



-----Original Message-----
From: Peter Karman []
Sent: Friday, March 12, 2004 2:22 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Searching only a specific div class

this seems related to an earlier post this week. make sure you declare 
div.product-authors as a PropertyName as well as a MetaName.

However, I still don't think that's going to help you. I'm not sure that 
the HTML parser (even when using libxml2, so the HTML2 parser) is smart 
enough to recognize tags in the <body>. I think it only works with 
<meta> tags in the <head>.

You might have better luck using the XML2 parser in your config, which 
should treat the tags as XML instead of HTML, and thus recognize your 
special tagset.

But Bill will probably give you a better answer than this.


Thomas Sewell supposedly wrote on 3/12/04 1:00 PM:

> I have a site that is structured in html with multiple items per page, with sets of information about each item deliminated by div tags with a descriptive class attribute.
> Shortened Example:
> <DIV class="content">
> <div class="product-details">
> <div class="product-authors">
> John Doe
> </div>
> </div>
> <div class="product-details">
> <div class="product-authors">
> Jane Doe
> </div>
> </div>
> </div>
> Currently I am just indexing the full text of the page and the default swish properties for each page. The source is html, so I assume it's defaulting to use the HTML parser.
> I would like to make a search available to search just the contents of the "Author" div's, for example.
> I've been trying to define and use a property for the Author class, but without success.
> I think I need to use some combination of metanames in the index config file and in the search cgi, but I've been unable to figure out the exact format to use.
> I assume it's going to be something along the lines of:
> UndefinedMetaTags ignore
> XMLClassAttributes class # Not supported by the HTML parser?
> MetaNames swishtitle swishdocpath swishdescription div.product-authors
> in the index config file.
> Is this possible? Would I have to convert to strict xhtml in order to use the XML parser to use the class attribute as a property/metatag? Or am I missing something else?
> What occurs when I try the above is that the index appears to work (it reports "4 properties sorted." without any errors), but the search script returns "Unknown property name to sort by: Property 'div.authors' is not defined in index '<my index file>'" when I try to search by div.authors.
> Anyone have an example of something like this working?
> Thanks for any help,
> Thomas Sewell

Peter Karman - Software Publications Engineer - Cray Inc
phone: 651-605-9009 -
Received on Fri Mar 12 11:39:43 2004