Skip to main content.
home | support | download

Back to List Archive

RE: RE: LWP,HTTP and HTML modules

From: David Norris <kg9ae(at)not-real.geocities.com>
Date: Wed Jan 20 1999 - 12:58:42 GMT
Here are a few(?) good reasons why one can't assume things, one way or
another, based on file extensions:

Apache only knows the MIME type of a file based on what you, the server
administrator, put in the mime.types, magic, and various other config files.
If you fail to define a MIME type for a file, Apache doesn't have a clue and
calls it whatever you defined, in the httpd.conf, as the default MIME type,
usually text/plain.  So, configure your server correctly.  Incorrect MIME
types break everything, not just the SWISH-E spider.  Well, Internet
Explorer for Windows generally ignores MIME types, so it won't break that.

Now, if you don't know this (which apparently someone doesn't), Unix systems
normally get the file type based on file byte-code headers.  Unix systems
have a magic file to provide file-type to byte-code mapping.  From a
terminal on a Unix system, type 'file /usr/sbin/httpd'  You should get a
detailed description of the type of that file.  On my Linux 2.0 system 'file
/usr/sbin/httpd' returns "ELF 32-bit LSB executable, Intel 80386, version 1,
dynamically linked, and stripped."  Hmmm, it doesn't do that based on
extension.  It reads the byte-code headers embedded at the beginning of
every file, which form the basis of the various file types.  Apache easily
does this, as well.  File extensions are exactly squat on Unix.  MacOS works
the same way.  File extensions still exist to make it easy to share stuff
with Windows users.  Everyone else on the planet doesn't need them.  Many
people use them as a quick and dirty way to specify the MIME type of a file
for which they do not have a byte-code pattern mapping.  Other's just don't
know any better.  The rest are using Windows.

http://www.apache.org/docs/mod/mod_mime_magic.html

You can override MIME, as mentioned, in various locations.  The Forcetype
directive would rarely need to be used on a properly configured system.
Perhaps, if you wanted to force a script handler to parse a file extension
it normally wouldn't.  Forcing PHP3, which normally uses .php3, to handle a
file with a .html extension would be an example of this.

The file extension is almost completely irrelevant unless you are on
Windows.  On 32-bit Windows it is only relevant because of the way Windows
HTTP servers are written.  HTTP servers don't have to follow the rules of
the OS regarding much of anything.  For instance, a not-so-unusual Apache
configuration might result in this:
/www/share/index.html.gz.en
http://localhost/index
http://localhost/index.html
http://localhost/index.html.gz
http://localhost/index.html.gz.en

These URLs all point to the same location in the file system.

This file is the English version of a gzipped HTML file.  This file has the
MIME type of application/x-compressed-gzip.  However, it might be called as
/, /index, /index.html, etc over HTTP.  Assuming it is a text/html file,
based on presence of a .html extension, would be a disaster.

What about a URL that doesn't exist in the filesystem.  For instance:
http://localhost/sports/football/scores/11-Jan-1999/

Might refer to a handler called sports which is selecting football scores
for January 11 1999 from an SQL source.  How do you determine the type of
file by its extension?  I know, then you check the MIME.  That sounds
perfectly logical on the surface.  But, it is fundamentally flawed in the
real world.

One has to understand that you can't assume anything with HTTP.  That's why
we have standard headers and responses defined in the HTTP specs.  I would
believe the server's Content-Type headers over any guessing based on where
periods lie in the URL.  If you can't make your server send the correct
headers, then you should either fix it or hack up the script yourself.  If
you're server doesn't support HTTP correctly, fire it like a bad employee.
A broken server does more damage than good.

Just some food for thought in the great MIME debate.

People are content with what they have, until they realize what they don't
have.  Thus exists Windows.

,David Norris

World Wide Web - http://www.geocities.com/CapeCanaveral/Lab/1652/
Page via mail - 412039@pager.mirabilis.com
ICQ Universal Internet Number - 412039
E-Mail - kg9ae@geocities.com
Received on Wed Jan 20 04:58:09 1999