At 06:03 PM 12/07/01 -0800, Joseph Ferner wrote:
>Let me rephrase my question:
> I am trying to index a server that contains jsp files and these
>jsp files contain HTML. Is there a directive (using the "-S prog" or
>"-S http" option) that would tell swish-e to treat the jsp as html so
>that swish-e would read meta data such as title?
I think I'm still missing the point. Are you trying to index jsp
documents, or web documents created by jsp?
Swish can parse HTML and XML, but it can't parse jsp. Just like any
templating text, SSI, some perl templating system, or jsp, if you can
convert it into html then swish can parse that. And the web server is the
best option for doing those conversions, in general.
If your jsp page contained
<%@ include file="/menu.html" %>
swish obviously isn't going to index the contents of menu.html as part page
if you are indexing the jsp pages without using a web server. Swish
doesn't understand that directive.
> I couldn't find such a directive, so instead I experimented with
>parsing the <title> tag out using spider.pl and then sending that title
>through the swish-e headers expected back from an external program.
I'm still confused, sorry. What you can send in the headers of a -S prog
is the last modified time, the path name, and the content-length.
You can also send a No-Contents: header to tell swish to only index the
file name or title (although I'm not sure I can see a use for that), and
you can also send Document-Type: header to override what parser swish will
use to parse the content. The reason for that is because swish uses file
extensions to describe the type of file, where a web server uses Content-type.
If you want to send a title to swish, then just add a <title> tag to the
top of the document contents. In perl:
$document = "<title>$title</title>" . $document;
(Although it would be better to generate valid html.)
I guess it would be a easier if you posted an example of what you are
trying to index.
Received on Sat Dec 8 04:27:08 2001