Skip to main content.
home | support | download

Back to List Archive

Re: Retrieving metatag content

From: Peter B. Ensch <peterbe(at)not-real.comcast.net>
Date: Sun Sep 12 2004 - 15:44:46 GMT
On Sunday 12 September 2004 9:28 am, Bill Moseley wrote:
> On Sun, Sep 12, 2004 at 06:39:45AM -0700, Peter B. Ensch wrote:
> > That's the solution I came up with overnight also. The only downside
> > to that approach is that it increases the index size by including
> > PropertyNames which I actually never use other than to build my web
> > form widgets (needless to say I have quite a few more Meta/Property
> > names than in my example).
>
> Maybe if your source docs were in a database then it would be much
> easier to get these list.
>

Unfortunately, not an option.

> A question: if you want to generate a select list I assume that means
> you expect a reasonably small set of different values -- which makes
> me wonder if you don't already have a pre-defined list of acceptable
> values used when creating the docs.
>

This would be possible; however, I'm trying to make our indexing and
search form template generation independent of the our site content
creators.

> Another option would be finding the options when parsing.  For
> example, if you are using spider.pl I think LWP can parse the META
> tags for you and then you could just create a list of unique values
> when spidering.

I'm not spidering. However, I just realized I can still parse the 
output of 'swish-e -T properties...', capture the swishdocpath and 
then explicitely parse the file for it's metatags, all without 
adding un-needed PropertyNames and increasing my index size
unnecessarily.

Looks like I have a solution. Thanks again for your help (and for
a GREAT piece of s/w).

Peter



-- 
^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~
Peter B. Ensch (peterbe@comcast.net)    
                                        
Linux 2.4.20-4GB 10:34am Up 5 days 19:04
^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~
Received on Sun Sep 12 08:45:02 2004