> This seems like too much of a debate for such a simple feature...
You can never have too much debate. Good ideas come from heavy debate.
Even though it does seem a bit trivial at the surface.
> Allowing the person who configures swish to specify *some* file
> extensions that he does not want indexed/spidered is just feature.
> If it we implemented not as "file extensions to avoid" but "regex
> expressions to avoid" then no one would be arguing... it is simply
> a tool to use when the situation calls for it.
You are correct. I guess it is more psychological than anything. File
extensions are immediately associated with file-type by many folks,
particularly Windows users.
> We all know that some of the big search engines (altavista, infoseek,
> determined by the people who built the index.
I think their goals are drastically different from a sysadmin indexing his
little cluster of servers.
> Now that I look at it this way, adding a feature to support regex
> to be avoided sounds like a good idea, just for "completeness". (Ok,
Regex is always a good thing.
> feature is
> not wrong or a sign of stupidity, and in some cases, is highly desirable.
It is simply an option. And, a potentially useful one at that. I agree.
Pattern matching can be a reliable means to determine file-type on a closed
system. I just wouldn't want anyone to think that it is in any way reliable
outside of their controlled environment. The less control you have over the
environment, the less reliable this method becomes. And, someone else with
more control over that environment could easily break it.
If my previous message, or even this message, insulted anyone's
intelligence, that was not my intent. I single out Windows users simply
because Windows differs from nearly every other OS in this regard. And,
that difference filters into all of its applications and servers. File
extensions are an easy assumption from a Windows perspective. It is very
easy to forget, or never know, that a file extension over HTTP may not
directly relate to its file type. The entire purpose of HTTP is to abstract
data from the system it is stored on. Believe me, I use Windows NT all day
every day. It has a definite purpose and excellent applications. It just
isn't the norm in the HTTP server world.
World Wide Web - http://www.geocities.com/CapeCanaveral/Lab/1652/
Page via mail - firstname.lastname@example.org
ICQ Universal Internet Number - 412039
E-Mail - email@example.com
Received on Wed Jan 20 16:13:54 1999