At 03:28 PM 12/07/01 -0800, Joseph Ferner wrote:
>Is there a way to send the title of the document from an external prog.
>For example the spider.pl knows the title of the document but can't send
>that title to swish-e through the header.
Spider.pl doesn't know the title of the page it fetches, that's part of the
content. The only thing spider.pl knows is text/html is an html document
and that it needs to extract links from the content.
I'm not clear what you are asking. But I'll ramble on in my normal way...
True there's no header you can pass in as the title. If you want swish to
index a title in an HTML document you just add <title> to the content.
If you are indexing XML, and you can say
And swish will see that as store the title as swishtitle, and return it as
the title along with the <title>'s from HTML docs.
If you are indexing XML and you have something like
But you want to index that as swishtitle, you can use PropertyNamesAlias
(or whatever it's called) to say that "title" is an alias for "swishtitle".
I suppose you could also say in HTML:
<meta name="swishtitle" content="title">
and get the title stored in the index under the swishtitle propertyname.
Actually, I think there's some confusing issues with ranking. The HTML
parser marks words as where they appear, and that info is used somewhat in
ranking. In config.h
#define RANK_TITLE 4.0
#define RANK_HEADER 3.0
#define RANK_META 3.0
#define RANK_COMMENTS 1.0
#define RANK_EMPHASIZED 0.0
So a title indexed in a real <title> will rank higher that a title entered
in a metaname in HTML. And with XML, IIRC, there's not a value assigned,
so the XML document with a "title" would rank lower.
But I doubt that's what you were asking about. Can you rephrase your
Received on Sat Dec 8 01:33:28 2001