16.2 Content Headers
The following sections describe the HTTP headers that specify the type and length of the content, and the version of the content being sent. Note that in this section we often use the term message. This term is used to describe the data that comprises the HTTP headers along with their associated content; the content is the actual page, image, file, etc.
16.2.1 Content-Type Header
Content-Type should be included in every set of headers, according to the standard, and Apache will generate one if your code doesn't. It will be whatever is specified in the relevant DefaultType configuration directive, or text/plain if none is active.
16.2.2 Content-Length Header
According to section 14.13 of the HTTP specification, the Content-Length header is the number of octets (8-bit bytes) in the body of a message. If the length can be determined prior to sending, it can be very useful to include it. The most important reason is that KeepAlive requests (when the same connection is used to fetch more than one object from the web server) work only with responses that contain a Content-Length header. In mod_perl we can write:
The Content-Length header can have a significant impact on caches by invalidating cache entries, as the following extract from the specification explains:
The response to a HEAD request MAY be cacheable in the sense that the information contained in the response MAY be used to update a previously cached entity from that resource. If the new field values indicate that the cached entity differs from the current entity (as would be indicated by a change in Content-Length, Content-MD5, ETag or Last-Modified), then the cache MUST treat the cache entry as stale.
It is important not to send an erroneous Content-Length header in a response to either a GET or a HEAD request.
16.2.3 Entity Tags
An entity tag (ETag) is a validator that can be used instead of, or in addition to, the Last-Modified header; it is a quoted string that can be used to identify different versions of a particular resource. An entity tag can be added to the response headers like this:
mod_perl offers the $r->set_etag( ) method if we have use( )ed Apache::File. However, we strongly recommend that you don't use the set_etag( ) method! set_etag( ) is meant to be used in conjunction with a static request for a file on disk that has been stat( )ed in the course of the current request. It is inappropriate and dangerous to use it for dynamic content.
By sending an entity tag we are promising the recipient that we will not send the same ETag for the same resource again unless the content is "equal" to what we are sending now.
The pros and cons of using entity tags are discussed in section 13.3 of the HTTP specification. For mod_perl programmers, that discussion can be summed up as follows.
There are strong and weak validators. Strong validators change whenever a single bit changes in the response; i.e., when anything changes, even if the meaning is unchanged. Weak validators change only when the meaning of the response changes. Strong validators are needed for caches to allow for sub-range requests. Weak validators allow more efficient caching of equivalent objects. Algorithms such as MD5 or SHA are good strong validators, but what is usually required when we want to take advantage of caching is a good weak validator.
A Last-Modified time, when used as a validator in a request, can be strong or weak, depending on a couple of rules described in section 13.3.3 of the HTTP standard. This is mostly relevant for range requests, as this quote from section 14.27 explains:
If the client has no entity tag for an entity, but does have a Last-Modified date, it MAY use that date in an If-Range header.
But it is not limited to range requests. As section 13.3.1 states, the value of the Last-Modified header can also be used as a cache validator.
The fact that a Last-Modified date may be used as a strong validator can be pretty disturbing if we are in fact changing our output slightly without changing its semantics. To prevent this kind of misunderstanding between us and the cache servers in the response chain, we can send a weak validator in an ETag header. This is possible because the specification states:
If a client wishes to perform a sub-range retrieval on a value for which it has only a Last-Modified time and no opaque validator, it MAY do this only if the Last- Modified time is strong in the sense described here.
In other words, by sending an ETag that is marked as weak, we prevent the cache server from using the Last-Modified header as a strong validator.
An ETag value is marked as a weak validator by prepending the string W/ to the quoted string; otherwise, it is strong. In Perl this would mean something like this:
Consider carefully which string is chosen to act as a validator. We are on our own with this decision:
... only the service author knows the semantics of a resource well enough to select an appropriate cache validation mechanism, and the specification of any validator comparison function more complex than byte-equality would open up a can of worms. Thus, comparisons of any other headers (except Last-Modified, for compatibility with HTTP/1.0) are never used for purposes of validating a cache entry.
If we are composing a message from multiple components, it may be necessary to combine some kind of version information for all these components into a single string.
If we are producing relatively large documents, or content that does not change frequently, then a strong entity tag will probably be preferred, since this will give caches a chance to transfer the document in chunks.