Peer-to-Peer Topology

Peer-to-peer applications don't necessarily abolish the central server completely. In fact, there are a variety of peer-to-peer designs. Some are considered "pure" peer-to-peer, and don't include any central components, while others are hybrid designs.

Peer-to-Peer with a Discovery Server

One of the most common peer-to-peer compromises involves a discovery server, which is a repository that lists all the connected peers. Often, a discovery server maps user names to peer connectivity information such as an IP address. When users start the application, they're logged in and added to the registry. After this point, they must periodically contact the discovery server to confirm that they're logged in and that their connection information hasn't changed.

There is more than one way for peers to use the information in a discovery server. In a simple application, peers may simply download a list of nearby users and contact them directly with future requests. However, it's also possible that the peer will need to communicate with a specific user (for example, in the case of a chat application). In this scenario, the discovery server can be structured to allow peer lookups by name, e-mail address, or some other fixed unique identifier. The peer interaction works like this:

The peer contacts the discovery server with a request to find the contact information for a specific user (for example, someone@somewhere.com).
The discovery server returns the user's IP address and port information.
The peer contacts the desired user directly.

This approach is also known as brokered or mediated peer-to-peer because the discovery server plays a central role in facilitating user interaction.

Note

This approach is much easier to scale than a pure peer-to-peer model. Although pure peer-to-peer models can be made efficient and scalable, the "plumbing" code is significantly more difficult. If you can rely on a discovery server in your applications, it will greatly simplify most solutions.

Peer-to-Peer with a Coordination Server

Some peer-to-peer applications benefit from a little more help on the server side. These applications combine peer-to-peer interaction with a central component that not only contains peer lookup information, but also includes some application-specific logic.

One example is Napster, which uses a central discovery and lookup server. In this system, peers register their available resources at periodic intervals. If a user needs to find a specific resource, the user queries the lookup server, which will then return a list of peers that have the desired resource. This helps to reduce network traffic and ensures that the peers don't waste time communicating if they have nothing to offer each other. The file-transfer itself is still peer-to-peer. This blend of peer-to-peer and traditional application design can greatly improve performance. By using a centralized server intelligently for a few critical tasks, network traffic can be reduced dramatically.

One question that arises with this sort of design is exactly how much responsibility the central server should assume. For example, you might create a messaging application in which communication is routed through the centralized server so that it can be analyzed or even logged. Similarly, you might design a content-sharing application that caches files on the server. These designs will add simplicity, but they can also lead to massive server bottlenecks for large peer-to-peer systems. As you'll discover in this book, a key part of the art of peer-to-peer programming with .NET is choosing the right blend between pure peer-to-peer design and more traditional enterprise programming.

Pure Peer-to-Peer

A pure peer-to-peer application has no central server of any kind. A typical user only communicates with a small group of nearby peers. In this scenario, even basic message routing and caching becomes a challenge. Typically, every message is automatically given several pieces of information, including the following:

A unique GUID
A field that records the "number of hops"—in other words, how many peers have already forwarded this copy of the message
A setting that determines the maximum number of hops the message will be allowed to live for
The sender's identifier (a GUID), and optionally, its connectivity information

To make a request, a peer creates a new message and sends it to its local group of peers. When a peer receives a message, it performs the following steps:

The peer checks that the message hasn't been recently received (probably by comparing it with a collection that caches the last 50 messages). If it has been received, the message is discarded.
The peer increments the number-of-hops field.
The peer checks the number of hops against the maximum number of hops allowed. If the number of hops exceeds the allowed lifetime, the message is discarded. This helps to prevent the same message from being continuously rerouted to the same peers over the network.
The peer forwards the message along to all the peers it knows about in a decentralized system such as Gnutella. The peers themselves will decide if they can satisfy the request. In a decentralized system such as Overnet, the peer now examines the message to determine the requested resource and compares that with a collection of information compiled by other peers. This information will probably be a hashtable that maps resource names to peers. When it finds a peer that can fulfill the request, it forwards the message to that peer only.
All the peers that have received the message start the same process at step 1.

This branching-out process is shown in Figure 2-2.

Figure 2-2: A pure peer-to-peer search

When a peer is found that can satisfy the request, it sends back a response. Typically, this response is sent back over the network in the same patch it took to arrive, thereby increasing the likelihood that it will be able to traverse the network. Alternatively, the peer could attempt to open a direct connection to the requesting peer to notify it that it has the requested resource.

Using this technique, a computer can indirectly contact a large network in a short time. There is no central server, and hence no single point of failure, and no possibility for out-of-date information. However, there are other drawbacks. The network traffic is likely to be high and the coding is complicated because each peer needs to maintain two things: a cache of peer-discovery data (which maps peer identifiers to peer connectivity information) and a cache of recently processed messages (which prevents a message from being rerouted to peers that have already processed it). It's also possible for some peer groups to become disconnected from the rest of the network, leading to multiple peer pockets instead of one large global network. This is most common when the number of peers is small.

One problem with pure peer-to-peer applications is the initial connection to the peer network. To find other peers, the application can use network-broadcasting techniques (such as IP multicast), but these can exert a significant overhead and won't work in all network environments. These approaches are most useful in an intranet in which the infrastructure required for multicast is known to exist.

Another approach is for the peer to use a list of well-known nodes to become connected at startup. This list might be retrieved from a configuration file (which can be updated every time the application is used successfully), or a fixed location on a network. An example of a pure peer-to-peer application that uses this approach is Gnutella.