On Mon, Mar 08, 2004 at 01:15:44PM -0800, Matthew Slocum wrote:
> I am trying to index only <div id="content">
> I think it is giving me all the div tags.
> in swish.conf:
> StoreDescription HTML "<div id=\"content\">"
No that won't work, sorry.
I'd use -S prog and use either HTML::Parser or HTML::TreeBuilder to
extract out that content.
You might be able to use a regular expression extract out the content,
although using regular expressions to parse HTML can be hard. But that
would be much faster than HTML::Parser or HTML::TreeBuilder.
Received on Mon Mar 8 13:36:56 2004