Skip to main content.
home | support | download

Back to List Archive

Alrighty then (RE: RE: No title being returned for version 1.3.2)

From: David Norris <kg9ae(at)not-real.geocities.com>
Date: Mon May 31 1999 - 10:46:48 GMT
OK, I think this makes some sense.

If I index http://www.misma.org/contact.html using the spider the TITLE is
set to "contact.html" in the swish index file.
HTTP Headers:
	HTTP/1.1 200 OK
	Date: Mon, 31 May 1999 10:22:55 GMT
	Server: Apache/1.2.5
	X-Server-CGI: PHP/3.0.7
	X-Resource-Indicator:
	X-Resource-Modified: 923650015
	Expires: Tue, 01 Jun 1999 10:22:55 GMT
	Cache-Control: post-check=43200,pre-check=86400
	Last-Modified: 1999-04-09T09:26:55Z
	Connection: close
	Content-Type: text/html; charset=iso-8859-1

If I index http://localhost/test/contact.html using the spider the TITLE is
set to "Contacts - MiSMA..."
HTTP Headers:
	HTTP/1.1 200 OK
	Date: Mon, 31 May 1999 10:21:54 GMT
	Server: Apache/1.3.6 (Win32)
	Parser: PHP/3.0.6 (Win32)
	Connection: close
	Content-Type: text/html

If I index /my_documents/test/contact.html using file system the TITLE is
set to "Contacts - MiSMA..."
No HTTP Header Equivalents.

This is exactly the same file in all three cases.  Line feed is Unix LF in
all three cases.  I sorta hacked my copy of the swishspider to force it to
index text/html; charset=iso-8859-1.  That appears to be the only major
difference which could have an effect on the parsing.  Something, somewhere
doesn't recognize that it should be parsing that document with the HTML
parser.  There is some other code somewhere that assumes anything not
exactly text/html isn't HTML.  Forcing the spider to index the contents of
text/html; charset=... isn't enough.

So, to test this theory I changed my content-type header on the misma.org
server.  Sure enough, the titles are now indexed correctly.  So, this
appears to be the Content-Type 'feature' of that old PERL module.

I don't know if this helps anyone else.  But, I can, at least, hack
something to change my content-type header when swishspider visits a
document until someone figures this out.

,David Norris

World Wide Web - http://www.geocities.com/CapeCanaveral/Lab/1652/
Home Computer - http://illusionary.tzo.cc/
Page via mail - 412039@pager.mirabilis.com
ICQ Universal Internet Number - 412039
E-Mail - kg9ae@geocities.com
Received on Mon May 31 03:44:45 1999