On Sun, Dec 07, 2003 at 02:43:43AM -0800, John Angel wrote:
> > > Hi, how to index only directories (/) and html extensions?
> > What are (/) directoires?
Oh, directory listings that the server might return if you don't request
a document and the server is not configured to automatically return
index.html (if there is one):
# directory listings (maybe) - return ok if ends in a slash:
return 1 if $uri->path =~ m[/$];
# or only index .html or .htm files
return 1 if $uri->path =~ m[.html?$];
# eles skip this document
Now, that makes the assumption that
- .htm and .html are text/html. And
- that a path that ends in a slash returns a directory and not
an audio file or some other non text/html file
- that there's links actually pointing to those "directories"
You would likely follow up that test with a test_response that checks
for text/html or text/plain, of course.
BTW -- if you return false from a test_response the connection is
aborted. This will break a Keep-Alive connection. This is because all
fetches are currently GET requests. It's been on my todo list for a
while to have an option to do HEAD requests for test_response tests,
which would allow the connection to remain open. That would only make a
difference on web servers that allowed a large number of keep-alive
requests before closing the connection.
I use "GET" because there are (were?) some servers that were not
correctly responding to HEAD requests.
> > > I am not familiar with regexp, should be something like this in
> > >
> > > return 0 if $uri->path =
> > return 0 if $uri->path =~ /\.(html|htm|shtml|asp|php|txt|phtml|cfm|jsp)$/;
> > ^^
> > That says to return false if the path part of the URL ends in those file
> > extensions -- meaning NOT to index those documents.
> Ok, than it should be:
> return 0 if $uri->path = /\.(html|htm|shtml|asp|php|txt|phtml|cfm|jsp)$/;
No. That's a syntax error. And my example was wrong (as I cut-n-pasted
return 1 if $uri->path =~ /\.(html|htm|shtml|asp|php|txt|phtml|cfm|jsp)$/;
Or if that's your last test, simply:
return $uri->path =~ /\.(html|htm|shtml|asp|php|txt|phtml|cfm|jsp)$/;
which returns true if it matches, else false.
> > > Will that work for queries?
> > What do you mean queries?
$uri->path in that case contains "index.php" only, and not the query string.
So, yes it will work for that.
Received on Sun Dec 7 14:18:20 2003