Skip to main content.
home | support | download

Back to List Archive

Re: I am trying to index only <div id="content">

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Mar 08 2004 - 21:36:55 GMT
On Mon, Mar 08, 2004 at 01:15:44PM -0800, Matthew Slocum wrote:
> I am trying to index only &lt;div id="content"&gt;
> I think it is giving me all the div tags.
> 
> in swish.conf:
> StoreDescription HTML "&lt;div id=\"content\"&gt;"

No that won't work, sorry.

I'd use -S prog and use either HTML::Parser or HTML::TreeBuilder to
extract out that content.

You might be able to use a regular expression extract out the content,
although using regular expressions to parse HTML can be hard.  But that
would be much faster than HTML::Parser or HTML::TreeBuilder.

-- 
Bill Moseley
moseley@hank.org
Received on Mon Mar 8 13:36:56 2004