So far we've considered peer-to-peer as it relates to the dominant architecture in enterprise applications. There's another way to look at the evolution of peer-to-peer technology: in relation to the development of the early Internet.
The Internet was first envisioned in the late 1960s as a global peer-to-peer system in which all computers could participate as equals. It was assumed that computers in the early Internet would always be on and always be connected. Thus, they were assigned permanent IP addresses that were recorded in a global registry called the Domain Name Service (DNS).
An IP address is a 32-bit number that uniquely identifies a computer on a network or the Internet. An IP address is typically written as four numbers from 0-255 separated by periods (as in 18.104.22.168). IP addresses can be tied to website names (such as http://www.amazon.com) using the DNS registry, but they don't need to be. In any case, the IP address is the key to finding and communicating with another computer on a conventional network.
Unlike today's Internet, the early Internet was much more open. Any computer could send a packet to another. Usenet allowed message-board postings to be propagated across the Internet in a manner not unlike the way today's peer-to-peer applications route their own proprietary messages. Client-server applications such as FTP and Telnet existed, but any computer could morph into a server and host the application. On the whole, the usage patterns of the early Internet were peer-to-peer.
Two trends conspired to shift the Internet into a predominantly client-server system. The first was the invention of Mosaic, the first web browser. It was at this point that a larger community of casual users began to take interest in the content that was available on the World Wide Web, and another model began to spread.
To access the Internet, a PC user needed to use a temporary dial-up connection via an Internet service provider (ISP). These PC users became second-class citizens of the Internet, interacting as clients to download information from established web servers. Because these users weren't permanently connected to the Internet, it made less sense to assign them an entry in the DNS. And because there weren't enough IP addresses available to handle the sudden onslaught of new users, ISPs began assigning IP addresses dynamically so that each user had a different IP address for every session. The DNS system was never designed for this sort of environment. The creators of the Internet assumed that changing an IP address would be a rare occurrence, and as a result, it could take days for a modification to make its way through the DNS system.
The end result was that the PC user became an invisible client on the Internet, able to receive data but not able to contribute any. With the commercialization of the Internet, this one-way pattern became the norm, and the Internet became the computer-based counterpart of newspaper and television media. Early visions of the Internet as the great equalizer of communication faded.
At the same time, the cooperative model of the Internet began to break down. Network administrators reacted to the threat of malicious users by using firewalls and network address translation (NAT). Both of these changes furthered the transformation to a client-server Internet. Computers could no longer contact each other as peers. Instead, communication could only succeed if the client inside the firewall (or behind the NAT) initiated it. Even the network infrastructure of the Internet became more and more optimized for client-server communication. Internet providers built up their networks with asymmetric bandwidth with which download times are always faster than upload times.
Interestingly, much of the underlying technology that supports the Internet is still based on peer-to-peer concepts. For example, the DNS registry is not a central repository stored at a single location but a system for sharing information among peer DNS servers. Similarly, a network of mail-server peers routes e-mail. On a hardware level, the physical routers that route network traffic follow some peer-to-peer patterns: They communicate together and cooperate to optimize a path for data transmission. However, the infrastructure that's developed on top of this substrate is primarily client-server. In order for peer-to-peer to succeed, applications will need to reintroduce some of the ideas pioneered by the early Internet.
Recently, there's been a resurgence of peer-to-peer activity on the Internet—this time in the form of a few revolutionary applications such as Napster, SETI@Home, ICQ, and Gnutella. Not all of these are pure peer-to-peer applications. In fact, all but Gnutella rely on a central server for some tasks. Nevertheless, they all include a framework that allows significant peer interaction.
Part of the reason behind this latest change is the increasing value of the ordinary PC. When they first appeared on the Internet, PCs were primitive enough that it seemed appropriate to treat them as dumb terminals that did little more than download and display HTML pages. Today, PCs offer much more CPU power and contain more disk space (and thereby host more potentially valuable content). PCs have also swelled to be the largest single part of the Internet. What they can't offer in quality, they can offer through sheer numbers.
Even a conservative estimate of 100 million PCs on the Internet, each with only a 100 MHz chip and a 100 MB hard drive, arrives at a staggering total of 10 billion MHz of processing power and 10,000 TBs of storage. The real total is almost certainly much larger.
There is one disheartening fact about all of the examples of current peer-to-peer applications. Without exception, each one has developed a proprietary system for peer discovery and communication. Some of these systems are complementary, and a few are based on more-or-less open standards. However, the next wave of peer-to-peer development will probably appear when broader standards emerge and technology companies such as Microsoft and Sun develop high-level tools that specifically address (and solve) the severe networking demands of peer-to-peer programming.
The first wave of peer-to-peer applications included instant-messaging software, which allows users to carry out real-time conversations. The key insight behind applications such as ICQ is that users would require a new kind of registry to allow them to find each other on the Internet. Or to put it another way: Communicating over the Internet is easy; locating a friend is not, because of the unreliable nature of dynamic IP addresses. ICQ solved this problem by introducing a dynamic registry that associates each user with a unique number. Later instant-messaging systems bind a user to an e-mail address or, in the case of Windows Messenger, a .NET passport. With a dynamic registry, the user's connection information (the IP address) can be changed instantly.
However, most messaging applications are not strictly peer-to-peer because they use a central server to route messages. This allows the server to store messages for offline users and route messages through a firewall. Some messaging systems provide the option of establishing direct client-to-client connections when possible and only using the server as a fallback (ICQ), while others use direct client-to-client communication when large amounts of information must be transferred (such as sending a file in Windows Messenger). There are advantages and drawbacks to both approaches, and you'll explore them in the second part of this book when you develop an instant-messaging example.
Instant-messaging applications require their own proprietary infrastructure. However, there are at least two tools that are evolving to supply some of this infrastructure for you. One is Jabber, an open-source instant-messaging platform that began as a switching system between incompatible instant-messaging protocols. Today, you can use Jabber as an XML routing system that allows peer communication. See http://www.jabber.org and http://www.jabbercentral.com for more information.
Groove is a more ambitious platform for collaborative applications that was developed by Ray Ozzie, the creator of Lotus Notes. Groove is not an open-source project, but it's of interest to Microsoft developers because it's COM-based and includes .NET tools, which make it easy to build collaborative applications that include automatic support for routing and encryption. Essentially, Groove provides a peer-to-peer infrastructure that you can use in your own peer-to-peer applications. You will find out more about Groove in Chapter 12.
SETI@Home is an innovative project that exploits the idle time on the average personal computer. SETI@Home masquerades as an ordinary screen saver. When it runs, it processes a chunk of astronomical radio data downloaded from the SETI@Home site and scans for unusual patterns. When it's finished, it uploads the results and requests another block.
The idea of using multiple ordinary computers to do the work of one supercomputer is far from new. In the early days of the Internet, distributed-computing projects were used to test encryption codes. Math hobbyists and researchers sometimes did similar independent work to generate potential prime numbers or test a theory, although the efforts were never as well integrated. SETI@Home was the first to create an effective vehicle for distributing the code (a screen saver) and combine it with a problem that could easily be factored into smaller parts. Several other companies have tried, without success, to create similar projects in the commercial arena.
In some ways, SETI@Home deviates from a true peer-to-peer system because it relies on a central server that ultimately controls the entire system. However, in another respect SETI@Home represents the ideal of peer-to-peer design:
Every computer participates in performing the heavy lifting. In Chapter 6, you'll learn how to design a peer-to-peer .NET application for distributed computing. Best of all, unlike SETI@Home, you'll learn how to make this program generic enough to handle a dynamically defined task.
For more information about SETI@Home, see http://setiathome.berkeley.edu.
Napster and Gnutella are examples of peer-to-peer applications designed for content sharing—specifically, for sharing MP3 music files.
Napster's genius was to combine peer-to-peer technology with a centralized peer directory. This created a hybrid system that performed and scaled extremely well. The central server never became a bottleneck because it was used for comparatively low-bandwidth activities while the actual file transfers were performed between peers on the edges of the network. Napster also exploited a niche that was particularly well suited for peer-to-peer applications: popular music. Any large group of users with music collections is certain to have a significant redundancy in catalogued songs. This redundancy allowed the overall system to work reliably, even though it was composed of thousands of unreliable clients. In other words, the chance that a given song could be found was quite high, though the chance that a given user was online was low.
Gnutella is a decentralized, pure peer-to-peer model that almost disappeared before being discovered by open-source developers. Unlike Napster, Gnutella doesn't use a central server, but relies on a message-based system in which peers forward communication to small groups. However, though all peers are given equal opportunity by the Gnutella software, they aren't all equal. When a computer is discovered with a higher bandwidth, it morphs into a super-node and is given a higher share of responsibility.
The Gnutella design has several well-known limitations. It does not provide any security to disguise user actions, or any anonymity for peers, or any way to verify the content of files. It also lacks the optimized routing and caching that allow more sophisticated peer-to-peer applications to dynamically correct load imbalances as they occur.
In Part Three, you'll use .NET's networking support to create a hybrid file-sharing application like Napster's.
Freenet is a peer-to-peer model for a virtual pooled hard drive—with one significant difference. Freenet's goal is to ensure free and uncensored communication over the Internet. Every Freenet peer surrenders a small portion of space on their hard drive, on which encrypted data is stored. The actual content stored on a given peer changes regularly so that the most requested content is replicated while the least requested content gradually disappears. Because of its design, Freenet is quite efficient for transferring large amounts of information. It also allows any user to freely publish information to the Internet, without requiring a website. However, there is no way for a Freenet peer to determine what's being stored on the local drive. Overall, Freenet is a niche use of peer-to-peer technology, but it's an example of an elegant, completely decentralized model. For more information about Freenet, see http://freenetproject.org.
One peer-to-peer application type that hasn't yet materialized is distributed searching. Currently, web search engines such as AltaVista and Google use spiders that continuously crawl through an unlimited series of websites, following links and cataloguing everything they find. When you perform a search with Google, you're searching the most recent results from the spider's search. Unfortunately, this doesn't necessarily reflect the content on the Web at that moment. Some of the results you retrieve may be months old, and may point to nonexisting links while omitting much more important current data. And the data stored on internal networks but not published on a website will always be beyond the reach of the search.
One idea is to supplement the current generation of searching technology with real-time searches over a peer network. Unfortunately, before a peer-searching technology can work, it needs a large network of like-minded peers with valuable content, and a content-description language that can be used to advertise resources and create queries. One early attempt to standardize such a system was Infrasearch. The technology behind Infrasearch was recently purchased by Sun and incorporated into their new JXTA platform. It's not yet ready for prime time, but it promises to change the way we find information on the Internet.
For information about JXTA, go to http://search.jxta.org.
.NET Terrarium is a learning game for the Microsoft .NET platform. It allows developers to create virtual "creature" classes and insert them into a virtual ecosystem hosted by a group of peers. Like Napster and SETI@Home, .NET Terrarium is a hybrid peer-to-peer application that makes use of a central discovery server. Currently, the source code for .NET Terrarium is not available, although it's expected that some pieces will gradually appear, accompanied by helpful commentary from Microsoft's architects. You can download Terrarium at http://www.gotdotnet.com/terrarium.
Peer-to-peer applications are still in their infancy, and already some reports are predicting their demise. Most of these claims center around the inability of most peer-to-peer venture projects to make money, particularly such high-profile failures as Napster. However, peer-to-peer is not just a business model. It's also a framework that deals with current problems with distributed computer systems—problems that can't be resolved in any other way.
There are two schools of thought on the future of peer-to-peer. Some believe that pure peer-to-peer applications are the ultimate future of computing, and that the current trend of combining peer-to-peer concepts with more traditional client-server components is transitional. Others believe that peer-to-peer technology will be integrated into the current generation of applications, thereby adding new capabilities.
One interesting example is the .NET learning game Terrarium, which was initially envisioned as a straight peer-to-peer application. When the resulting network traffic became difficult to manage, the team switched to a hybrid system with sever-based peer discovery. The final solution incorporates .NET web services (primarily a client-server technology) with peer-to-peer networking. Lance Olson, Terrarium's lead program manager, describes it this way:
I think that the peer-to-peer hype was sold as a new application model and an entirely new world around which we would build applications. And I think that the truth of the matter is that it's much more evolutionary…. Peer-to-peer is certainly not dead. However, the hype and the notion of peer-to-peer as just a stand-alone concept is probably … more of an evolutionary step than something that is just an entirely new model. And so the peer-to-peer world as I see it in the future is more one of applications that are more fault tolerant or are more interactive and have a better ability to contact other resources that are available on the network. So they're just like the applications today, only better in those senses.
Recently, more and more developers have been speaking out in favor of hybrid peer-to-peer designs. Quite simply, enterprise companies are unwilling to give up their servers. They need to be able to access a central component they can control, support, back up, and protect. Enterprise companies are much more interested in systems that centralize some core services but still allow for client interactions using peer-to-peer protocols.
This book focuses on the hybridization of peer-to-peer concepts. In other words, you'll learn how to create solutions that incorporate peer-to-peer design, but the book may make use of server components that aren't necessarily pure peer-to-peer systems. Pure peer-to-peer implementations require a significant amount of messy network coding, and .NET does not yet provide high-level ways to deal with these problems. (Other platforms, such as JXTA, are also evolving to tackle these problems.) Peer-to-peer—like .NET—is a compromise. It's your challenge to integrate it the best way you can for your development.
The IPv6 protocol promises to solve this problem and prevent the Internet from running out of IP addresses. IPv6 uses 128-bit IP addresses with values represented as hexadecimal numbers separated by colons (as in 0528:a165:ff00:50bf:7708:0dc9:4d76). IPv6 will support an incredible one trillion machines, and one billion networks. However, it's uncertain when IPv6 will be widely implemented.