Scalability

Before the Internet, many large organizations developed their infrastructures based on monolithic core business functions, such as claims processing systems running on mainframes or large ERP applications running on Sun Enterprise ISK machines. For these types of applications, which placed a CPU-intensive load on the infrastructure, it made good sense to use large, dedicated machines. This approach required a considerable initial investment in infrastructure and personnel. Let us look at developing a scalable Web services infrastructure at a practical cost.

To support efficient scaling, an organization should consider a strategy that allows it to add resources as needed. The recommended approach is to deploy Web services applications in independent server pools that can be joined to provide the appropriate business integration. Applications should be partitioned at a physical level, which has the advantage of providing flexibility of use. Partitioning gives flexibility when deploying multiple workloads on servers. It allows for maximal resource usage and minimal unnecessary resource reallocation.

Larger servers provide high capacity but at a reduction in flexibility of deployment. We prefer using smaller two- or four-way systems that provide the ability to scale quickly and cheaply. This configuration has the added advantage of having a better price/performance ratio than other options. With a lower acquisition cost, it becomes practical to deploy multiple redundant servers throughout your infrastructure. This will guarantee your Web services a higher level of availability.

Before considering this approach, it is important to determine the characteristics of your service. Some Web services are suitable for scaling widely, by deploying them across identically configured servers. Other Web services may be better suited to scaling deeply, into more powerful, large CPU systems. The granularity of control for performance and availability should also become part of your decision.

Also consider taking a multi-tier architectural approach to infrastructure. Partition your infrastructure into at least four distinct tiers for a Web services environment:

A Web services tier that has an HTTP daemon and application server
A mid-level business tier for business applications
An Enterprise Information System (EIS) or database tier that provides a persistence mechanism that could be implemented in either a relational, heirarchical, XML, or object approach
A potential integration tier that connects the business tier to the EIS resources

Each tier may also provide security, accounting, systems management, and other utility functions. The resource requirements for each tier will be different and can be configured for optimal performance, as opposed to applying generic tuning approaches. As an example, Web services may require fast network throughput and disk caching, while your mid-tier business application may be more CPU intensive. Having the right physical architecture promotes scalability.

The industry trend is away from populating each server with disk drives and toward externalizing all storage onto appliances. This approach is known as Network Attached Storage, which allows for scaling CPU and storage resources independently of each other. This has the added benefit of providing more efficient use of storage resources. Separating storage from processing permits employing redundant storage arrays. This increases availability by eliminating the need to stop server processing during backups and by providing failover capability.

Consolidating the storage function of multiple servers onto a single storage platform provides higher throughput, availability, and scalability. When multiple systems share the same storage platform, applications can be assigned resources in exact quantities, making it a more cost effective solution. Network storage also has the benefit of accelerating backup and replication, because it no longer consumes valuable CPU cycles. Multiple servers can share a single volume, which also helps keep data synchronized without the need to make multiple copies.

The biggest inhibitor to scalability in a J2EE-based Web service is typically at the EJB tier. Many tool vendors provide wizard-like code generation tools that allow you to simply select an EJB and automatically turn it into a Web service. J2EE has helped enterprises build and deploy Web services faster using a standards-based approach but has not helped architects think about the scalability of their applications. Ensuring that your Web service will scale is often a larger challenge that getting it to work in the first place. The typical Web service may update data only 10% of the time and spends most of its time reading and formatting data. It becomes crucial for reading data to be fast and efficient.

Many Java tutorials encourage developers to consider a three-tiered approach to developing applications (Figure 16.5). The presentation tier will be based on servlets or Java Server Pages (JSP) connecting to EJBs for business logic, which in turn connect to a database using JDBC. This approach allows developers to build reusable components and provides for a good separation of concerns. Wizard-style tools that generate Web services also use this model.

Figure 16.5: Typical J2EE architecture

The separation of layers in the typical J2EE approach also has the side effect of introducing latency and can be the cause of performance problems. The latency between the client and the Web service are tied to a threading model that uses a thread per request. The allocated thread is blocked until the request re turns. The connection between the presentation tier and the EJB tier typically contains housekeeping information, the majority of which is read-only. The communication between tiers uses Remote Method Invocation (RMI), which has significant overhead. You could consolidate tiers by using local interfaces, but this breaks scalability. The EJB tier also incurs latency overhead, by talking with the database to retrieve what most of the time is unchanged data.

The solution to reducing latency is to implement sound caching strategies and/or defer work that doesn't need to occur in real time and move it to a background process. By using a messaging approach (discussed later), a service has to wait only for the message to be recorded and not completed. This has the effect of removing all blocking. The best approach to avoid continuously rereading data is to employ a cache. Some developers make the mistake of caching frequently-used data in local, static, or global variables but do not exercise control of those variables' lifecycles.

If the majority of your requests are read-only, caching can satisfy the requests for data immediately and eliminate latency. Consider a user who wants to retrieve a stock quote for Sun Microsystems and Univision from the Flute Bank investment Web service. Caching allows the investment Web service to answer the request without connecting a backend quote provider, which allows the infrastructure to handle additional load. A caching strategy for Web services that use other Web services outside their organization may also save transaction fees.

Caching has several issues to consider. When developing your own approach, take into account the points in Table 16.5.

Table 16.5: Caching Considerations
Function	Description
Last image	The ability to snapshot rapidly changing information, such as stock quotes, that will be used for multiple services.
Request	The ability to snapshot dynamically created information that does not change frequently, such as news items for a particular stock.
Delta	The ability to snapshot data that represents changes at the individual message level, such as price upticks.
Events	The ability to allow applications to be notified of changes to underlying cached data.