4.1 Apache Configuration
Apache configuration can be confusing. To minimize the number of things that can go wrong, it's a good idea to first configure Apache itself without mod_perl. So before we go into mod_perl configuration, let's look at the basics of Apache itself.
4.1.1 Configuration Files
Prior to Version 1.3.4, the default Apache installation used three configuration files: httpd.conf, srm.conf, and access.conf. Although there were historical reasons for having three separate files (dating back to the NCSA server), it stopped mattering which file you used for what a long time ago, and the Apache team finally decided to combine them. Apache Versions 1.3.4 and later are distributed with the configuration directives in a single file, httpd.conf. Therefore, whenever we mention a configuration file, we are referring to httpd.conf.
By default, httpd.conf is installed in the conf directory under the server root directory. The default server root is /usr/local/apache/ on many Unix platforms, but it can be any directory of your choice (within reason). Users new to Apache and mod_perl will probably find it helpful to keep to the directory layouts we use in this book.
There is also a special file called .htaccess, used for per-directory configuration. When Apache tries to access a file on the filesystem, it will first search for .htaccess files in the requested file's parent directories. If found, Apache scans .htaccess for further configuration directives, which it then applies only to that directory in which the file was found and its subdirectories. The name .htaccess is confusing, because it can contain almost any configuration directives, not just those related to resource access control. Note that if the following directive is in httpd.conf:
<Directory /> AllowOverride None </Directory>
However, you must also make sure that this file can't be accessed directly from the Web, or else you risk exposing your configuration. This is done automatically for .ht* files by Apache, but for other files you need to use:
<Files .acl> Order Allow,Deny Deny from all </Files>
Another often-mentioned file is the startup file, usually named startup.pl. This file contains Perl code that will be executed at server startup. We'll discuss the startup.pl file in greater detail later in this chapter, in Section 4.3.
Beware of editing httpd.conf without understanding all the implications. Modifying the configuration file and adding new directives can introduce security problems and have performance implications. If you are going to modify anything, read through the documentation beforehand. The Apache distribution comes with an extensive configuration manual. In addition, each section of the distributed configuration file includes helpful comments explaining how each directive should be configured and what the default values are.
If you haven't moved Apache's directories around, the installation program will configure everything for you. You can just start the server and test it. To start the server, use the apachectl utility bundled with the Apache distribution. It resides in the same directory as httpd, the Apache server itself. Execute:
panic% /usr/local/apache/bin/apachectl start
Now you can test the server, for example by accessing http://localhost/ from a browser running on the same host.
4.1.2 Configuration Directives
ServerRoot "/usr/local/apache" DocumentRoot "/usr/local/apache/docs"
You can change the port to which the server is bound by editing the Port directive. This example sets the port to 8080 (the default for the HTTP protocol is 80):
You might want to change the user and group names under which the server will run. If Apache is started by the user root (which is generally the case), the parent process will continue to run as root, but its children will run as the user and group specified in the configuration, thereby avoiding many potential security problems. This example uses the httpd user and group:
User httpd Group httpd
Make sure that the user and group httpd already exist. They can be created using useradd(1) and groupadd(1) or equivalent utilities.
Many other directives may need to be configured as well. In addition to directives that take a single value, there are whole sections of the configuration (such as the <Directory> and <Location> sections) that apply to only certain areas of the web space. The httpd.conf file supplies a few examples, and these will be discussed shortly.
4.1.3 <Directory>, <Location>, and <Files> Sections
Let's discuss the basics of the <Directory>, <Location>, and <Files> sections. Remember that there is more to know about them than what we list here, and the rest of the information is available in the Apache documentation. The information we'll present here is just what is important for understanding mod_perl configuration.
Apache considers directories and files on the machine it runs on as resources. A particular behavior can be specified for each resource; that behavior will apply to every request for information from that particular resource.
Directives in <Directory> sections apply to specific directories on the host machine, and those in <Files> sections apply only to specific files (actually, groups of files with names that have something in common). <Location> sections apply to specific URIs. Locations are given relative to the document root, whereas directories are given as absolute paths starting from the filesystem root (/). For example, in the default server directory layout where the server root is /usr/local/apache and the document root is /usr/local/apache/htdocs, files under the /usr/local/apache/htdocs/pub directory can be referred to as:
<Directory /usr/local/apache/htdocs/pub> </Directory>
or alternatively (and preferably) as:
<Location /pub> </Location>
Exercise caution when using <Location> under Win32. The Windows family of operating systems are case-insensitive. In the above example, configuration directives specified for the location /pub on a case-sensitive Unix machine will not be applied when the request URI is /Pub. When URIs map to existing files, such as Apache::Registry scripts, it is safer to use the <Directory> or <Files> directives, which correctly canonicalize filenames according to local filesystem semantics.
It is up to you to decide which directories on your host machine are mapped to which locations. This should be done with care, because the security of the server may be at stake. In particular, essential system directories such as /etc/ shouldn't be mapped to locations accessible through the web server. As a general rule, it might be best to organize everything accessed from the Web under your ServerRoot, so that it stays organized and you can keep track of which directories are actually accessible.
Locations do not necessarily have to refer to existing physical directories, but may refer to virtual resources that the server creates upon a browser request. As you will see, this is often the case for a mod_perl server.
When a client (browser) requests a resource (URI plus optional arguments) from the server, Apache determines from its configuration whether or not to serve the request, whether to pass the request on to another server, what (if any) authentication and authorization is required for access to the resource, and which module(s) should be invoked to generate the response.
For any given resource, the various sections in the configuration may provide conflicting information. Consider, for example, a <Directory> section that specifies that authorization is required for access to the resource, and a <Files> section that says that it is not. It is not always obvious which directive takes precedence in such cases. This can be a trap for the unwary.
22.214.171.124 <Directory directoryPath> ... </Directory>
Scope: Can appear in server and virtual host configurations.
<Directory> and </Directory> are used to enclose a group of directives that will apply to only the named directory and its contents, including any subdirectories. Any directive that is allowed in a directory context (see the Apache documentation) may be used.
The path given in the <Directory> directive is either the full path to a directory, or a string containing wildcard characters (also called globs). In the latter case, ? matches any single character, * matches any sequence of characters, and [ ] matches character ranges. These are similar to the wildcards used by sh and similar shells. For example:
<Directory /home/httpd/docs/foo[1-2]> Options Indexes </Directory>
will match /home/httpd/docs/foo1 and /home/httpd/docs/foo2. None of the wildcards will match a / character. For example:
<Directory /home/httpd/docs> Options Indexes </Directory>
matches /home/httpd/docs and applies to all its subdirectories.
<DirectoryMatch /home/www/.*/public> Options Indexes </DirectoryMatch>
will match /home/www/foo/public but not /home/www/foo/private. In a regular expression, .* matches any character (represented by .) zero or more times (represented by *). This is entirely different from the shell-style wildcards used by the <Directory> directive. They make it easy to apply a common configuration to a set of public directories. As regular expressions are more flexible than globs, this method provides more options to the experienced user.
If multiple (non-regular expression) <Directory> sections match the directory (or its parents) containing a document, the directives are applied in the order of the shortest match first, interspersed with the directives from any .htaccess files. Consider the following configuration:
<Directory /> AllowOverride None </Directory> <Directory /home/httpd/docs/> AllowOverride FileInfo </Directory>
Let us detail the steps Apache goes through when it receives a request for the file /home/httpd/docs/index.html:
126.96.36.199 <Files filename > ... </Files>
Scope: Can appear in server and virtual host configurations, as well as in .htaccess files.
The <Files> directive provides access control by filename and is comparable to the <Directory> and <Location> directives. <Files> should be closed with the corresponding </Files>. The directives specified within this section will be applied to any object with a basename matching the specified filename. (A basename is the last component of a path, generally the name of the file.)
<Files> sections are processed in the order in which they appear in the configuration file, after the <Directory> sections and .htaccess files are read, but before <Location> sections. Note that <Files> can be nested inside <Directory> sections to restrict the portion of the filesystem to which they apply. However, <Files> cannot be nested inside <Location> sections.
The filename argument should include a filename or a wildcard string, where ? matches any single character and * matches any sequence of characters, just as with <Directory> sections. Extended regular expressions can also be used, placing a tilde character (~) between the directive and the regular expression. The regular expression should be in quotes. The dollar symbol ($) refers to the end of the string. The pipe character (|) indicates alternatives, and parentheses (()) can be used for grouping. Special characters in extended regular expressions must be escaped with backslashes (\). For example:
<Files ~ "\.(pl|cgi)$"> SetHandler perl-script PerlHandler Apache::Registry Options +ExecCGI </Files>
would match all the files ending with the .pl or .cgi extension (most likely Perl scripts). Alternatively, the <FilesMatch regex> ... </FilesMatch> syntax can be used.
188.8.131.52 <Location URI> ... </Location>
Scope: Can appear in server and virtual host configurations.
<Location> sections are processed in the order in which they appear in the configuration file, after the <Directory> sections, .htaccess files, and <Files> sections have been interpreted.
The <Location> section is the directive that is used most often with mod_perl.
Note that URIs do not have to refer to real directories or files within the filesystem at all; <Location> operates completely outside the filesystem. Indeed, it may sometimes be wise to ensure that <Location>s do not match real paths, to avoid confusion.
The URI may use wildcards. In a wildcard string, ? matches any single character, * matches any sequences of characters, and [ ] groups characters to match. For regular expression matches, use the <LocationMatch regex> ... </LocationMatch> syntax.
The <Location> functionality is especially useful when combined with the SetHandler directive. For example, to enable server status requests (via mod_status) but allow them only from browsers at *.example.com, you might use:
<Location /status> SetHandler server-status Order Deny,Allow Deny from all Allow from .example.com </Location>
As you can see, the /status path does not exist on the filesystem, but that doesn't matter because the filesystem isn't consulted for this request—it's passed on directly to mod_status.
4.1.4 Merging <Directory>, <Location>, and <Files> Sections
Apart from <Directory>, each group is processed in the order in which it appears in the configuration files. <Directory>s (group 1 above) are processed in order from the shortest directory component to the longest (e.g., first / and only then /home/www). If multiple <Directory> sections apply to the same directory, they are processed in the configuration file order.
Sections inside <VirtualHost> sections are applied as if you were running several independent servers. The directives inside one <VirtualHost> section do not interact with directives in other <VirtualHost> sections. They are applied only after processing any sections outside the virtual host definition. This allows virtual host configurations to override the main server configuration.
If there is a conflict, sections found later in the configuration file override those that come earlier.
4.1.5 Subgrouping of <Directory>, <Location>, and <Files> Sections
Let's say that you want all files to be handled the same way, except for a few of the files in a specific directory and its subdirectories. For example, say you want all the files in /home/httpd/docs to be processed as plain files, but any files ending with .html and .txt to be processed by the content handler of the Apache::Compress module (assuming that you are already running a mod_perl server):
<Directory /home/httpd/docs> <FilesMatch "\.(html|txt)$"> PerlHandler +Apache::Compress </FilesMatch> </Directory>
The + before Apache::Compress tells mod_perl to load the Apache::Compress module before using it, as we will see later.
Using <FilesMatch>, it is possible to embed sections inside other sections to create subgroups that have their own distinct behavior. Alternatively, you could also use a <Files> section inside an .htaccess file.
Note that you can't put <Files> or <FilesMatch> sections inside a <Location> section, but you can put them inside a <Directory> section.
4.1.6 Options Directive Merging
However, if all the options on the Options directive are preceded by either a + or - symbol, the options are merged. Any options preceded by + are added to the options currently active, and any options preceded by - are removed.
For example, without any + or - symbols:
<Directory /home/httpd/docs> Options Indexes FollowSymLinks </Directory> <Directory /home/httpd/docs/shtml> Options Includes </Directory>
Indexes and FollowSymLinks will be set for /home/httpd/docs/, but only Includes will be set for the /home/httpd/docs/shtml/ directory. However, if the second Options directive uses the + and - symbols:
<Directory /home/httpd/docs> Options Indexes FollowSymLinks </Directory> <Directory /home/httpd/docs/shtml> Options +Includes -Indexes </Directory>
then the options FollowSymLinks and Includes will be set for the /home/httpd/docs/shtml/ directory.
4.1.7 MinSpareServers, MaxSpareServers, StartServers, MaxClients, and MaxRequestsPerChild
MinSpareServers, MaxSpareServers, StartServers, and MaxClients are standard Apache configuration directives that control the number of servers being launched at server startup and kept alive during the server's operation. When Apache starts, it spawns StartServers child processes. Apache makes sure that at any given time there will be at least MinSpareServers but no more than MaxSpareServers idle servers. However, the MinSpareServers rule is completely satisfied only if the total number of live servers is no bigger than MaxClients.
MaxRequestsPerChild lets you specify the maximum number of requests to be served by each child. When a process has served MaxRequestsPerChild requests, the parent kills it and replaces it with a new one. There may also be other reasons why a child is killed, so each child will not necessarily serve this many requests; however, each child will not be allowed to serve more than this number of requests. This feature is handy to gain more control of the server, and especially to avoid child processes growing too big (RAM-wise) under mod_perl.
These five directives are very important for getting the best performance out of your server. The process of tuning these variables is described in great detail in Chapter 11.