As you learned earlier, Web servers and browsers communicate using the Hypertext Transfer Protocol (HTTP). The current version of HTTP (1.1) is described in RFC 2616. The purpose of HTTP is to support the transfer of HTML documents. HTTP is an application-level protocol. The HTTP client and server applications use the reliable TCP transport protocol to establish a connection.
Although the nature of Web communication has become extremely complex, most of that complexity relates to how the server builds the HTML content and what the browser does with the content it receives. The actual process of transferring the content through HTML is relatively uncluttered.
When you enter a URL into the browser window, the browser first checks the scheme of the URL to determine the protocol. (As you learned earlier in this hour, Web browsers support other protocols besides HTTP.) If the browser determines that the URL refers to a resource on an HTTP site, it extracts the DNS name from the URL and initiates the name resolution process. The client computer sends the DNS lookup request to a name server and receives the server's IP address. The browser then uses the server's IP address to initiate a TCP connection with the server. (See Hour 6 for more on TCP.)
By the Way
In older versions of HTTP (before version 1.1), the client and server opened a new TCP connection for each item transferred. Recent versions of HTTP allow the client and server to maintain a persistent connection.
After the TCP connection is established, the browser uses the HTTP GET command to request the Web page from the server. The GET command contains the URL of the resource the browser is requesting and the version of HTTP the browser wants to use for the transaction. The browser can send the relative URL with the GET request (rather than the full URL) because the connection with the server has already been established:
GET /watergate/tapes/transcript HTTP/1.1
Table 17.4 lists some of the HTTP header fields. All fields are optional, and any field that is not understood by the browser is ignored.
As you can see from Table 17.4, some of the header fields are purely informational. Other header fields may contain information necessary to parse and process the incoming HTML document.
By the Way
The header field format used with HTML is borrowed from the email header format specified in RFC 822.
The Content-Length field is particularly important on today's Internet. In the earlier HTTP version 1.0, each request/response cycle required a new TCP connection. The client opened a connection and initiated a request. The server fulfilled the request and then closed the connection. In that situation, the client knew when the server had stopped sending data because the server closed the TCP connection. Unfortunately, this process required the increased overhead necessary for continually opening and closing connections. More recent versions of HTTP (HTTP 1.1 and later) allow the client and server to maintain the connection for longer than a single transmission. In that case, the client needs some way of knowing when a single response is finished. The Content-Length field specifies the length of the HTML object associated with the response. If the server doesn't know the length of the object it is sending—a situation increasingly common with the appearance of Dynamic HTML—the server sends the header field Connection:close to notify the browser that the server will specify the end of the data by closing the connection.