7.2 Asking the Right Questions
There is much more to the web
service than writing the
code, and firing the server to crunch this code. But before you
specify a set of questions that will lead you to the coverage of the
whole mechanism and not just a few of its components, it is hard to
know what issues are to be checked, what components are to be
watched, and what software is to be monitored. The better questions
you ask, the better coverage you should have.
Let's raise a few questions and look at some
Q: How long does it take to process each request? What is the request
A: Obviously you will have
more than one script and
handler, and each one might be called in different modes; the amount
of processing to be done may be different in every case. Therefore,
you should attempt to benchmark your code, using all the modes in
which it can be executed. It is good to learn the average case, as
well as to learn the edges—the worst and best cases.
It is also very important to find out the distribution of different
requests relative to the total number of requests. You might have
only two handlers: one very slow and the other very fast. If you
optimize for the average case without finding out the request
distribution, you might end up under-optimizing your server, if in
fact the slow request handler has a much higher call rate than the
fast one. Or you might have your server over-optimized, if the slow
handler is used much less frequently than the fast handler.
Remember that users can never be trusted not to do unexpected things
such as uploading huge core dump files, messing with HTML forms, and
supplying parameters and values you didn't consider.
Which leads us to two things. First, it is not enough to test the
code with automatic offline benchmarking, because chances are you
will forget a few possible scenarios. You should try to log the
requests and their execution times on the live server and watch the
real picture. Secondly, after everything has been optimized, you
should add a safety margin so your server won't be
rendered unusable when heavily hit by the worst-case usage load.
Q: How many requests can the server process simultaneously?
A: The number of simultaneous
requests you can handle is
equal to the number of web server processes you can afford to run.
This all translates to the amount of main memory (RAM) available to
the web server. Note that we are not talking about the amount of RAM
installed on your machine, since this number is misleading. Each
machine is running many processes in addition to the web server
processes. Most of these don't consume a lot of
memory, but some do. It is possible that your web servers share the
available RAM with big memory consumers such as SQL engines or proxy
servers. The first step is to figure out what is the real amount of
memory dedicated to your web server.
Q: How many simultaneous requests is the site expected to service? What
is the expected request rate?
A: This question sounds similar to the previous one, but it is different
in essence. You should know your server's abilities,
but you also need to have a realistic estimate of the expected
Are you really expecting eight million hits per day? What is the
expected peak load, and what kind of response time do you need to
guarantee? Doing market research would probably help to identify the
potential request rates, and the code you develop should be written
in a scalable way, to allow you to add a few more machines to
accommodate the possibility of rising demand.
Remember that whatever statistics you gathered during your last
service analysis might change drastically when your site gains
popularity. When you get a very high hit rate, in most cases the
resource requirements grow exponentially, not linearly!
Also remember that whenever you apply code changes it is possible
that the new code will be more resource-hungry than the previous
code. The best case is when the new code requires fewer resources,
but generally this is not the case.
If you machine runs the service perfectly well under normal loads,
but the load is subject to occasional peaks—e.g., a product
announcement or a special offer—it is possible to maintain
performance without changing the web service at all. For example,
some services can be switched off temporarily to cope with a peak.
Also avoid running heavy, non-urgent processes (backups, cron jobs,
etc.) during the peak times.
Q: Who are the users?
A: Just as it is important for a
public speaker to know her audience in
order to provide a successful presentation and deliver the right
points, it is important to know who your users are and what can be
expected from them.
If you are administering an Intranet web service (internal to a
company, publicly inaccessible), you can tell what connection speed
most of your users have, the number of possible users, and therefore
the maximum request rate. You can be sure that the service will not
gain a sudden popularity that will drive the demand rate up
exponentially. Since there are a known number of users in your
company, you know the expected limit. You can optimize the Intranet
web service for high-speed connections, but don't
forget that some users might connect to the Intranet with a slower
dial-up connection. Also, you probably know at what hours your users
will use the service (unless your company has branches all over the
world, which requires 24-hour server availability) and can optimize
service during those hours.
If you are administering an Internet web service, your knowledge of
your audience is very limited. Depending on your target audience, it
can be possible to learn about usage patterns and obtain some
numerical estimates of the possible demands. You can either attempt
to do the research by yourself or hire professionals to do this work
for you. There are companies who release various survey reports
available for purchase.
Once your service is running in the ideal way, know what to expect by
keeping up with the server statistics. This will allow you to
identify possible growth trends. Certainly, most web services cannot
stand the so-called Slashdot Effect, which
happens when some very popular news service (Slashdot, for instance)
releases an exotic report on your service and suddenly all readers of
this news service are trying to hit your site. The effect can be a
double-edged sword: on one side you gain free advertising, but on the
other side your server may not be able to withstand the suddenly
increased load. If that's the case, most clients may
not succeed in getting through.
Just as with the Intranet server, it is possible that your users are
all located in a given time zone (e.g., for a particular
country-specific service), in which case you know that hardly any
users will be hitting your service in the early morning. The peak
will probably occur during late evening and early night hours, and
you can optimize your service during these times.
Q: How can we protect ourselves from the Slashdot Effect?
A: Use mod_throttle. mod_throttle allows you to limit the use of your
server based on different metrics, configurable per
vhost/location/file. For example, you can limit requests for the URL
/old_content to a maximum of four connections
per second. Using mod_throttle will help you prioritize different
parts of your server, allowing smart use of limited bandwidth and
limiting the effect of spikes.
Q: Does load balancing help in this area?
A: Yes. Load balancing, using mod_backhand, Cisco LocalDirector, or
similar products, lets you wring the most performance out of your
servers by spreading the load across a group of servers.
Q: How can we deal with the situation where we can afford only a limited
amount of bandwidth but some of the service's
content is large (e.g., streaming media or large files)?
A: mod_bandwidth is a module for the Apache web server that enables the
setting of server-wide or per-connection bandwidth limits, based on
the directory, size of files, and remote IP/domain.
Also see Akamai, which allows you to cache large content in
regionally specific areas (e.g., east/west coast in the U.S.).
The given list of questions is in no way complete, and each specific
project will have a different set of questions and answers. Some will
be retained from project to project; others will be replaced by new
ones. Remember that this is not a one-size-fits-all glove. While
partial functionality can generally be optimized using the same
method, you will have to go through this question-and-answer process
each time from scratch if you want to achieve the best performance.