Understanding Cryptography

Before you see a full example with the System.Security.Cryptography classes, you need to understand the basics of three cryptography essentials: hash codes, encryption, and digital signatures.

Understanding Hash Codes

A hash algorithm takes a block of binary data and uses it to generate a fixed-side checksum. For example, the SHA-256 hash algorithm always creates a 256-bit (32-byte) hash for data, regardless of the size of the input data.

Hash codes serve a variety of purposes. One of the most common is to prevent data tampering. For example, consider a scenario in which you store important data on a disk file and record the hash of that data in a database. At a later point, you can open the file, recalculate the hash, and compare it with the value in the database. If the two hashes don't agree, the file has changed. If your program is the only application allowed to access that file, and if your program always records the hash value in the database after making changes, it's reasonable to assume that the file has been tampered with. You can use a similar technique to validate messages that are sent between computers.

Like any type of checksum, a hash algorithm works in one direction only. It's completely impossible to re-create the document from the hash because the hash doesn't include all the information that was in the document. However, cryptographic hash algorithms also have a key characteristic that distinguishes them from other types of checksums: They're collision resistant. Changing even a single byte in the source document has a fifty-fifty chance of independently changing each byte in the hash. It's extremely difficult for an attacker to look at a hash and create a new document that will generate the same hash. (The difficulty of this task is comparable to trying to break an encrypted message through brute force.) Thus, hashes play a key role in ensuring data integrity.

The System.Security.Cryptography namespace includes the following hash algorithms:

MD5 (implemented by the MD5CryptoServiceProvider class) generates a 128-bit hash.
SHA-1 (implemented by the SHA1CryptoServiceProvider class) generates a 160-bit hash.
SHA-256 (implemented by the SHA256Managed class) generates a 256-bit hash.
SHA-384 (implemented by the SHA384Managed class) generates a 384-bit hash.
SHA-512 (implemented by the SHA512Managed class) generates a 512-bit hash.

As a rule of thumb, the larger the hash size, the more difficult it is to find another document that will generate a duplicate hash value.

Note

Using hash codes isn't enough to protect messages exchanged between computers. The problem is that an attacker can tamper with a message and simply generate a new hash code that matches the altered message. To overcome this problem, you need to combine hashing with some form of encryption to create a keyed hash or digital signature.We'll look at digital signatures later in this chapter.

Understanding Encryption

There are essentially two types of encryption: symmetric encryption and asymmetric encryption. In many peer-to-peer applications, you'll need to use both. Either way, the basic principle behind encryption is always the same: Encryption scrambles information so that it can only be understood by the recipient. A malicious third party might be able to intercept the message, using characteristics of the network that are beyond your control, but won't be able to decipher it.

Technically, any digitally encrypted message can be broken using a brute force attack, which is a process by which an attacker tries every possible sequence of bytes as a key until finally one combination works. In most cases, a brute-force attack is prohibitively expensive, which is to say that the value of the data is less than the cost (in time or computer hardware) of cracking it, or the data will no longer be valid by the time it's deciphered. Very few attacks use brute force. Usually, they rely on weak or compromised passwords or flaws in the application or platform that are much easier to exploit.

Symmetric encryption (also known as "secret-key" encryption) is the type of encryption that most people are familiar with. It depends on a shared, secret key that's used to encrypt and decrypt data. Technically, this secret key is a series of bytes that can be derived from a password or other information as needed. Symmetric encryption is far faster than asymmetric encryption but suffers from a significant limitation in distributed computing scenarios: Both parties need to know the secret key before the communication begins. There's no easy way to transmit the secret key information without compromising security.

The .NET Framework includes the following symmetric algorithms:

DES (implemented by the DESCryptoServiceProvider class) uses a 64-bit key.
TripleDES (implemented by the TripleDESCryptoServiceProvider class) uses a 128-bit or 192-bit key.
RC2 (implemented by the RC2CryptoServiceProvider class) uses a 40- to 128-bit key.
Rijndael (implemented by the RijndaelManaged class) uses a 128-bit, 192-bit, or 256-bit key.

The larger the key size, the harder it is for a brute-force attack to succeed. Generally, DES is supported for legacy uses only, because its 64-bit key size is considered dangerously weak. Rijndael is the recommended encryption algorithm.

Asymmetric encryption uses a pair of mathematically related keys that includes both a public and private key. The private key is carefully guarded, while the public key is made available to the entire world. The interesting thing about asymmetric encryption is that any data encrypted with one key can only be decrypted with the other matching key. This makes asymmetric encryption very versatile.

For example, consider two peers communicating on a network. Each peer has its own key pair.

Peer A encrypts a message using the public key that belongs to Peer B.
Peer A sends the message to Peer B.
Peer B decrypts the message using the corresponding private key. No other user can decrypt this message (not even Peer A, the one who created it) because no one else has the private key.

This demonstrates how asymmetric encryption can be used to protect information without needing to exchange a shared, secret key value. This makes it possible for any two parties on a network to exchange encrypted data, even if they have never met before. The process is diagrammed in Figure 11-1.

Figure 11-1: How user A can send an encrypted message to user B

Asymmetric encryption also underlies a special form of message validation. It works like this:

Peer A encrypts a message using its own private key.
Peer A sends the message to Peer B.
Peer B decrypts the message using the public key belonging to Peer A. Because this key is publicly available, any user can perform this step. However, because the message can only be encrypted using the private key, Peer B now knows beyond a doubt that the message originated from Peer A.

This shows you how message authentication works with asymmetric encryption. In practice, you don't need to encrypt the entire message—just a hash code, as described in the next section. Often, both validation and encryption will be combined in the same application to prevent message tampering and hide sensitive data. This is the approach taken in the peer-to-peer example shown later in this chapter.

.NET provides implementation for two asymmetric algorithms:

RSA (implemented by the RSACryptoServiceProvider class) allows key sizes from 364 to 16,384 bits (in 8-bit increments).
DSA (implemented by the DSACryptoServiceProvider class) allows key sizes from 364 to 512 bits (in 64-bit increments).

In most cases, you'll use RSA, because DSA can only be used for creating and verifying digital signatures, not for encrypting data. Note that asymmetric encryption allows for much larger key sizes. However, the key size can be misleading. It's estimated that a 1,024-bit RSA key (the default size) is roughly equivalent in strength to a 75-bit symmetric key.

Asymmetric encryption does have one significant shortcoming: It's slow, often hundreds of times slower than symmetric encryption. It also produces less compact ciphertext (encrypted data) than symmetric encryption. Thus, if you need to encode a large amount of information (for example in a file-sharing application), asymmetric encryption alone is probably not the approach you want. A better choice is to combine symmetric and asymmetric encryption. We'll discuss this topic a little later.

Understanding Digital Signatures

Digital signatures combine the concepts of hash codes and asymmetric encryption. Remember, hash codes are used to take a digital "fingerprint" of some data, and thereby prevent it from being altered. However, attackers can get around this defense if hash codes aren't stored in a secure location by regenerating and replacing the hash code. Digital signatures prevent this type of tampering using encryption.

To sign some data with a digital signature, a user creates a hash and then encrypts the hash using a private key. Any other user can validate the signature because the corresponding public key is freely available, but no other user can generate a new signature because they won't have the required private key. Thus, a digital signature is tamper-proof.

Of course, life isn't quite this simple. In order for this system to work, the recipient must already know the public key of the message author. Otherwise, the signature can't be validated. Unfortunately, you can't just transmit the public key, because then it could be read and replaced by the same attacker who will attempt to tamper with the message! The solution? Use a third party that can validate users and vouch for their public keys. On the Internet, this is often performed with digital certificates. Digital certificates contain a user's public and private keys and are signed by a third-party certificate authority (CA) such as VeriSign. When you establish an SSL connection with a website, your computer decides to trust the website's identity because it provides a certificate signed by a trusted CA.

In a peer-to-peer application, you could use certificates (in fact, Intel's Peer-to-Peer Accelerator Kit provides exactly this feature, as described in Chapter 13). However, .NET doesn't provide any classes either for working with certificates in a user's certificate store or validating that a certificate is signed by a trusted CA. In addition, the certificate itself cannot contain application-specific information, such as whether a user should be given supervisor or guest rights in a peer-to-peer application. To get around this limitation in this chapter, we'll use our discovery service to act as a central authority for user-identity validation. It will map public keys to application-specific permissions using the database.

Note

.NET does provide classes that allow you to read some basic certificate information from a certificate file. This rudimentary functionality is found in the System.Security.Cryptography.X509Certificates namespace. In addition, the downloadable Web Services Enhancements (WSE) provides some tools for reading information from installed certificates. In future versions of the .NET Framework, these features will be more closely integrated.