At 08:02 AM 12/19/01 -0800, Alex Lyons wrote:
>>> So i have a long NoContents list. But there are a number of files
>>> that come up which have no extension. As these are not listed
>>> under the NoContents directive they are indexed by contents.
>> That's one of the problems with those file extension-based directives.
>It might help if _all_ decisions based on the file name (or pathname)
>were handled by regexps like FileRules does. I still get confused which
>directives only look at suffixes (IndexOnly, NoContents, IndexContents,
>FileFilter - I think) and which can do regexps on the whole pathname
>(FileRules, FileMatch - I think).
I've thought about that. Think it would break anything to extend the
syntax to be
IndexOnly regex /pattern/
Or would there need to be new directives:
(BTW -- I've seen a lot of configs where IndexOnly is used with NoContents,
but the extensions listed in NoContents are not also listed in IndexOnly,
which they need to be.)
What's been near the bottom of my todo list for a long time was to
depreciate the directives that take an extension, and instead switch to
content-types for all files.
Then use a mime-types file to map extensions to content-types for the file
system, and when spidering, simply pass the content-type onto swish. And
then have a default content type directive.
And to make all that work better, move to a more apache-like/xml-type of
config file, as it would be nice to be able set the config based on what
directory or file one is looking at. For example, one might want the
default content type to be text/html, but be able to set README to text/plain.
It would be good for many of the config settings. I have something where
I'd like to be able to redefine wordchars based on the directory.
Now how much that would kill indexing speed is anyone's guess.
Received on Wed Dec 19 16:23:36 2001