Skip to main content.
home | support | download

Back to List Archive

Re: Ignoring tags of a certain class.

From: <moseley(at)not-real.hank.org>
Date: Thu Sep 04 2003 - 15:47:01 GMT
On Thu, Sep 04, 2003 at 07:27:18AM -0700, John McGowan wrote:
> I'm indexing a site that has an entirely textual navigation system, and 
> I want to configure swish-e to ignore those menus when indexing the 
> site.  The site is at http://www.emiliemcgowan.com/

Is your site generated from templates or dynamically?

http://swish-e.org/current/docs/SWISH-FAQ.html#How_do_I_prevent_indexing_parts_of_a_document_

> the menu code is in <td> tags, like the following...
> 
> <TD CLASS="MENUTEXT"><A CLASS=MENUAT HREF="main.taf?p=0">Home</A></TD>
> 
> but of course I don't want to ignore all TD's or all A's.

Well, if you tell swish-e that your documents are XML you can do this:

moseley@laptop:~$ cat 1.html
<html>
<head><title>Title</title>
</head>
<body>
<table><tr><td>first</td></tr></table>
<table><tr><td class="foo">second</td><td>second2</td></tr></table>
<table><tr><td>third</td></tr></table>
</body>
</html>

moseley@laptop:~$ cat c
DefaultContents XML2
XMLClassAttributes class
IgnoreMetaTags td.foo

moseley@laptop:~$ swish-e -c c -i 1.html -T indexed_words -v0
    Adding:[1:swishdefault(1)]   'title'   Pos:17  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'first'   Pos:18  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'second2'   Pos:31  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'third'   Pos:32  Stuct:0x1 ( FILE )

Notice that "second" is not indexed.

But that only works for XML docs, and I'm not sure why that's a 
limitation without spending some time looking at the code.


-- 
Bill Moseley
moseley@hank.org
Received on Thu Sep 4 15:50:43 2003