Digital Versus MIDI—Sounds Great, Less Filling
There are two kinds of sounds that a computer can make: digital and synthesized. Digital sounds are basically recordings of sounds, while synthesized sounds are programmed reproductions of sounds based on algorithms and hardware tone generators. Digital sounds are usually used for sound effects, like explosions and people talking, while synthesized sounds are used for music. And in most cases these days, synthesized sounds are only used for music and not sound effects. However, back in the '80s, game programmers used FM synthesizers and tone generators to make the sounds of engines, explosions, gunshots, drums, sirens, and so forth. Granted, they didn't sound as good as digitized sound effects, but they worked back then.
Digital Sound—Let the Bits Begin
Digital sound involves digitization, which means to encode data in the digital form of ones and zeros, such as 110101010110. Just as an electrical signal can create sounds by causing a magnetic field to move the speaker's cone magnet, talking into a speaker creates the opposite effect. That is, the speaker generates an electrical signal based on the vibrations it senses. This electrical signal has the sound information encoded in it as an analog or linear voltage, as shown in Figure 10.4.
With the proper hardware, this linear voltage with the sound information encoded in it can be sampled and digitized. This is exactly how your CD player works. The information on CDs is in digital form, whereas information on tapes is analog. Digital information is much easier to process and is the only information that digital computers can process (there's a surprise). So for a computer to process sound, that sound must be converted into a digital data stream with an analog-to-digital converter, as shown in section A of Figure 10.5.
Once the sound is recorded into the memory of the computer, it can be processed or played back with a digital-to-analog converter (D/A), as shown in section B of Figure 10.5. The point is, you need to convert the sound information to digital format before you can work with it. But recording digital sound is a bit tricky. Sound has a lot of information in it. If you want to sample sound realistically, there are two factors that you must consider: frequency and amplitude.
The number of samples you record of a sound per second is called the sample rate. It must be at least twice the frequency of the original sound if you want to reproduce it exactly. In other words, if you're sampling a human voice that has a range of 20-2,000Hz, you must sample the sound at 4,000Hz!
The reasoning for this is mathematical and based on the fact that all sounds are composed of sine waves. Thus, if you can sample the highest frequency sine wave contained in a sound, you can sample all the lower ones that compose that sound. But to sample a sine wave of frequency f, you must sample it at a rate of 2*f. At a rate of only f, you can't tell if you're on the upward crest of a wave or the downward crest of a wave per cycle. In other words, it takes two points to reconstruct any sine wave. This is called Shannon's Theorem, and the minimal sampling rate is called the Nyquist frequency—were they roommates or something?
Anyway, the second sampling parameter is the amplitude resolution—meaning, how many different values are there for the amplitude? If you have only eight bits per sample, that means there are only 256 different possible amplitudes. This is enough for games, but for reproduction of professional sounds and music you need at least 16 bits of resolution, giving 65,536 different possible values.
So that's digital sound for you. Basically, it is a recording or sampling of sound that has been converted to digital form from an analog signal. Digital sound is great for sound effects and short sounds, but it's bad for long sounds because of its memory requirements—a 16-bit, 44.1 KHz, CD-quality sound uses about 88KB a second. On the other hand, if your game is going on CD, you can spare a couple hundred megs for pure digital music. Finally, digital sound sounds far better than synthesized sound 99 percent of the time, but under DirectMusic, synthesized music sounds almost as good.
Synthesized Sound and MIDI
Although digital sound is currently the best-sounding, synthesized sound has been around a long time and is getting better and better. Synthesized sound isn't digitally recorded; it's a mathematical reproduction of a sound based on a description. Synthesizers use hardware and algorithms to generate sounds on-the-fly from a description of the desired sound. For example, let's say you wanted to hear a 440Hz pure concert A note. You could design a piece of hardware that generated a pure analog sine wave of any frequency from 0-20,000Hz and then instruct it to create a 440Hz tone. This is the basis of synthesis.
The only problem is that most people want to hear more than a single tone (unless you're listening to a musical birthday card), so hardware is needed that supports at least 16–32 different tones at the same time, as shown in Figure 10.6. This isn't bad, and a number of different video game consoles used something like this back in the '70s and '80s. But people still weren't satisfied. The problem is that most sounds have many frequencies in them; they have undertones, overtones, and harmonics (multiples of each frequency). This is what makes them sound textured and full.
Normally, I wouldn't use the words textured and full to describe sound because it lowers my public cool factor, but I had to because they're common terms used by music people. So please bear with me.
The first attempt at better sound was FM synthesis. Remember the old Ad-Lib card? It was the precursor of the Sound Blaster and the first PC card to support multiple-channel FM synthesis. (The FM stands for frequency modulation.) An FM synthesizer can alter not only the amplitude of a sine wave sound, but also the frequency of the wave.
FM synthesis operates on the mathematical basis of feedback. An FM synthesizer feeds the output of the signals back into themselves, thereby modulating the signals and creating harmonics and phase-shifted tones from the original single sine wave. The bottom line is that they sound very real compared to single tones.
It's MIDI Time!
At about the same time all this FM synthesis stuff came out, a file format for music synthesis was catching on called MIDI (Musical Instrument Digital Interface). MIDI is a language that describes musical compositions as a function of time. Instead of digitizing a sound, a MIDI piece describes it as keys, instruments, and special codes. For example, a MIDI file might look like this:
Turn on Channel 1 with a B flat. Turn on Channel 2 with a C sharp. Turn off Channel 1. . . . Turn all channels off.
Of course, this information is encoded in a binary serial stream, but you get the picture. Moreover, each channel in the MIDI specification is connected to a different instrument or sound. You might have 16 channels, each one representing a different instrument such as piano, drums, guitar, bass, flute, trumpet, and so on. So MIDI is an indirect method of encoding music.
However, it leaves the synthesis up to the hardware and records only the actual musical notes and timing. Alas, MIDI on one computer may sound completely different than on another, due to the method of synthesis and the instrument data. On the other hand, a MIDI file for an hour of music might only be a few hundred kilobytes of memory, instead of requiring megabytes for the same music in digital form! So it's been worth it, in many cases.
The only problem with MIDI and FM synthesis is that they are only good for music. Sure, you can design FM synthesizers to create white noise for explosions or laser blasts, but the sounds will always be simple and won't have the organic feel that digitized sound has. So more advanced methods of hardware synthesis have been created, such as wave table and wave guide technology.