Skip to main content.
home | support | download

Back to List Archive

RE: Sending Title from prog

From: Joseph Ferner <josephferner(at)not-real.mgfairfax.rr.com>
Date: Sat Dec 08 2001 - 02:03:44 GMT
Let me rephrase my question:
	I am trying to index a server that contains jsp files and these
jsp files contain HTML.  Is there a directive (using the "-S prog" or
"-S http" option) that would tell swish-e to treat the jsp as html so
that swish-e would read meta data such as title?

	I couldn't find such a directive, so instead I experimented with
parsing the <title> tag out using spider.pl and then sending that title
through the swish-e headers expected back from an external program.

-----Original Message-----
From: Bill Moseley [mailto:moseley@hank.org] 
Sent: Friday, December 07, 2001 8:32 PM
 Subject: Re: [SWISH-E] Sending Title from prog

At 03:28 PM 12/07/01 -0800, Joseph Ferner wrote:
>Is there a way to send the title of the document from an external prog.
>For example the spider.pl knows the title of the document but can't
send
>that title to swish-e through the header.

Spider.pl doesn't know the title of the page it fetches, that's part of
the
content.  The only thing spider.pl knows is text/html is an html
document
and that it needs to extract links from the content.

I'm not clear what you are asking.  But I'll ramble on in my normal
way...

True there's no header you can pass in as the title.  If you want swish
to
index a title in an HTML document you just add <title> to the content.

If you are indexing XML, and you can say

<xml>
   <swishtitle>Title</swishtitle>
   ...
</xml>

And swish will see that as store the title as swishtitle, and return it
as
the title along with the <title>'s from HTML docs.

If you are indexing XML and you have something like

<xml>
   <title>Title</title>
   ...
</xml>

But you want to index that as swishtitle, you can use PropertyNamesAlias
(or whatever it's called) to say that "title" is an alias for
"swishtitle".

I suppose you could also say in HTML:

<meta name="swishtitle" content="title">

and get the title stored in the index under the swishtitle propertyname.

Actually, I think there's some confusing issues with ranking.  The HTML
parser marks words as where they appear, and that info is used somewhat
in
ranking.  In config.h

#define RANK_TITLE              4.0
#define RANK_HEADER             3.0
#define RANK_META               3.0
#define RANK_COMMENTS           1.0
#define RANK_EMPHASIZED         0.0

So a title indexed in a real <title> will rank higher that a title
entered
in a metaname in HTML.  And with XML, IIRC, there's not a value
assigned,
so the XML document with a "title" would rank lower.

But I doubt that's what you were asking about.  Can you rephrase your
question?

Bill Moseley
mailto:moseley@hank.org
Received on Sat Dec 8 02:04:09 2001