Let me rephrase my question:
I am trying to index a server that contains jsp files and these
jsp files contain HTML. Is there a directive (using the "-S prog" or
"-S http" option) that would tell swish-e to treat the jsp as html so
that swish-e would read meta data such as title?
I couldn't find such a directive, so instead I experimented with
parsing the <title> tag out using spider.pl and then sending that title
through the swish-e headers expected back from an external program.
From: Bill Moseley [mailto:email@example.com]
Sent: Friday, December 07, 2001 8:32 PM
Subject: Re: [SWISH-E] Sending Title from prog
At 03:28 PM 12/07/01 -0800, Joseph Ferner wrote:
>Is there a way to send the title of the document from an external prog.
>For example the spider.pl knows the title of the document but can't
>that title to swish-e through the header.
Spider.pl doesn't know the title of the page it fetches, that's part of
content. The only thing spider.pl knows is text/html is an html
and that it needs to extract links from the content.
I'm not clear what you are asking. But I'll ramble on in my normal
True there's no header you can pass in as the title. If you want swish
index a title in an HTML document you just add <title> to the content.
If you are indexing XML, and you can say
And swish will see that as store the title as swishtitle, and return it
the title along with the <title>'s from HTML docs.
If you are indexing XML and you have something like
But you want to index that as swishtitle, you can use PropertyNamesAlias
(or whatever it's called) to say that "title" is an alias for
I suppose you could also say in HTML:
<meta name="swishtitle" content="title">
and get the title stored in the index under the swishtitle propertyname.
Actually, I think there's some confusing issues with ranking. The HTML
parser marks words as where they appear, and that info is used somewhat
ranking. In config.h
#define RANK_TITLE 4.0
#define RANK_HEADER 3.0
#define RANK_META 3.0
#define RANK_COMMENTS 1.0
#define RANK_EMPHASIZED 0.0
So a title indexed in a real <title> will rank higher that a title
in a metaname in HTML. And with XML, IIRC, there's not a value
so the XML document with a "title" would rank lower.
But I doubt that's what you were asking about. Can you rephrase your
Received on Sat Dec 8 02:04:09 2001