Skip to main content.
home | support | download

Back to List Archive

Re: Ignoring tags of a certain class.

From: <moseley(at)>
Date: Thu Sep 04 2003 - 15:47:01 GMT
On Thu, Sep 04, 2003 at 07:27:18AM -0700, John McGowan wrote:
> I'm indexing a site that has an entirely textual navigation system, and 
> I want to configure swish-e to ignore those menus when indexing the 
> site.  The site is at

Is your site generated from templates or dynamically?

> the menu code is in <td> tags, like the following...
> <TD CLASS="MENUTEXT"><A CLASS=MENUAT HREF="main.taf?p=0">Home</A></TD>
> but of course I don't want to ignore all TD's or all A's.

Well, if you tell swish-e that your documents are XML you can do this:

moseley@laptop:~$ cat 1.html
<table><tr><td class="foo">second</td><td>second2</td></tr></table>

moseley@laptop:~$ cat c
DefaultContents XML2
XMLClassAttributes class

moseley@laptop:~$ swish-e -c c -i 1.html -T indexed_words -v0
    Adding:[1:swishdefault(1)]   'title'   Pos:17  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'first'   Pos:18  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'second2'   Pos:31  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'third'   Pos:32  Stuct:0x1 ( FILE )

Notice that "second" is not indexed.

But that only works for XML docs, and I'm not sure why that's a 
limitation without spending some time looking at the code.

Bill Moseley
Received on Thu Sep 4 15:50:43 2003