Game Programming Gurus

Free JavaScript Editor Ajax Editor
↑

Main Page

Working with High-Color Modes

High-color modes (modes that require more than eight bits per pixel) are of course more visually pleasing to the eye than the 256-color modes. However, they usually aren't used in software-based 3D engines for a number of reasons. The biggest reasons are as follows:

Computational speed— A standard 640x480 pixel frame buffer consists of 307,200 pixels. If each pixel is 8-bit, that means that most calculations can be done using a single byte per pixel and rasterization is simpler. On the other hand, in 16-bit or 24-bit modes, full RGB space calculations are usually employed (or very large lookup tables) and the speed is cut at least in half. Furthermore, two or three bytes per pixel must be written to the frame buffer instead of one as in 8-bit modes.

Of course, with acceleration hardware, this isn't as much of a problem for bitmapping or 3D (in fact, most 3D cards work in 24/32-bit color), but for software rasterization (which is what you're learning in this book), it's a big deal. You want to write the least amount of data per pixel as possible, and 8-bit mode meets this requirement (although it's not as pretty as 16-bit). However, with 8-bit mode, you can rest assured that someone with a Pentium 133-233 might be able to play your game, and you won't have to worry about your audience having a P4 2.4GHz with 3D acceleration at a minimum.
Memory bandwidth— This is something that people hardly ever take into consideration. Your PC has either an ISA (Industry Standard Architecture), VLB (VESA Local Bus), PCI (Peripheral Component Interconnect), or PCI/AGP (Accelerated Graphics Port) hybrid bus system. The bottom line is that everything but the AGP port is relatively slow compared to video clock rates. This means that although you may have a 500+ MHz Pentium III, it's not going to do you any good if you have a PCI bus that's bottlenecking your access to video RAM and/or acceleration hardware. Of course, a number of hardware optimizations can help in this area, such as caching, multi-port VRAM, and so forth, but there's always a fill rate limit that you can never exceed no matter what you do. The moral of the story is that as you move to higher and higher resolutions and color depths, in many cases the memory bandwidth is more of a limiting factor than the processor's speed. However, with AGP 2x and 4x this will become less of an issue.

However, today, computers are sufficiently fast that you can do 16-bit and even 24-bit software engines and they are fast enough (not nearly as fast as hardware of course). So just something to think about if you are making simpler games that target a large audience, 8-bit is also easier to understand for beginners to program. Working with high-color modes is conceptually similar to working with palletized modes, with the single caveat that you aren't writing color indices into the frame buffer, but instead full RGB-encoded pixel values. This means that you must know how to create an RGB pixel encoding for the high-color modes that you want to work with. Figure 7.1 depicts a number of various 16-bit pixel encodings.

Figure 7.1. 16-Bit RGB pixel encodings.

graphics/07fig01.gif

16-Bit High-Color Mode

Referring to Figure 7.1, there are a number of possible bit encodings for 16-bit modes:

Alpha.5.5.5— This mode uses a single bit at position D₁₅ to represent a possible Alpha component (transparency), and the remaining 15 bits are equally distributed with five bits for red, five bits for green, and five bits for blue. This makes a total of 2⁵ = 32 shades for each color and a palette of 32x32x32 = 32,768 colors.

X.5.5.5— This mode is similar to the Alpha.5.5.5 mode, except the MSB (most significant bit) is unused and can be anything. The color range is still 32 shades of each primary color (red, green, and blue), with a total of 32x32x32 = 32,768 colors.

5.6.5— This is the most common mode and uses all 16 bits of the WORD to define the color. The format is, of course, five bits for red, six bits for green, and five bits for blue, for a total of 32x64x32 = 65536 color. Now, you may ask, "Why six bits for green?" Well, my little leprechaun, the answer is that human eyes are more sensitive to green, and therefore the increased range for green is the most logical choice of the three primaries.

Now that you know the RGB bit-encoding formats, the question is how to build them up. You accomplish this task with simple bit shifting and masking operations, as shown in the following macros:

// this builds a 16 bit color value in 5.5.5 format (1-bit alpha mode)
#define _RGB16BIT555(r,g,b) ((b & 31) + ((g & 31) << 5) + ((r & 31) << 10))

// this builds a 16 bit color value in 5.6.5 format (green dominate mode)
#define _RGB16BIT565(r,g,b) ((b & 31) + ((g & 63) << 5) + ((r & 31) << 11))

You'll notice from the macros and Figure 7.2 that the red bits are located in the high-order bits of the color WORD, the green bits are in the middle bits, and the blue bits are located in the low-order bits of the color WORD. This may seem backwards because PCs are little-endian and place data in low-to-high order, but in this case the bits are in big-endian format, which is much better because they follow RGB order from MSB to LSB.

Figure 7.2. Color WORDs are big-endian.

graphics/07fig02.gif

WARNING

Before you build a quick demo of 16-bit mode, there's one more little detail that I must address—how on Earth do you detect if the video mode is 5.5.5 or 5.6.5? This is important because it's not under your control. You can tell DirectDraw to create a 16-bit mode, but the bit encoding is up to the hardware. You must know this detail because the green channel will be all jacked up if you don't take it into consideration! What you need to know is the pixel format.

Getting the Pixel Format

To figure out the pixel format of any surface, all you need to do is call the function IDIRECTDRAWSURFACE7:GetPixelFormat(), shown here:

HRESULT GetPixelFormat(LPDDPIXELFORMAT lpDDPixelFormat);

You already saw the DDPIXELFORMAT structure in the previous chapter, but the fields you're interested in are

DWORD dwSize;    // the size of the structure, must be set by you
DWORD dwFlags;   // flags describing the surface, refer to Table 7.1
DWORD dwRGBBitCount; // number of bits for Red, Green, and Blue

The dwSize field must be set before you make the call to the size of a DDPIXELFORMAT structure. After the call, both the dwFlags field and the dwRGBBitCount fields will be valid and contain the informational flags, along with the number of RGB bits for the surface in question. Table 7.1 lists a subset of the possible flags contained in dwFlags.

Table 7.1. Valid Flags for DDPIXELFORMAT.dwFlags

Value Description

DDPF_ALPHA The pixel format describes an alpha-only surface.

DDPF_ALPHAPIXELS The surface has alpha channel information in the pixel format.

DDPF_LUMINANCE The pixel format describes a luminance-only or luminance-alpha surface.

DDPF_PALETTEINDEXED1 The surface is 1-bit color indexed.

DDPF_PALETTEINDEXED2 The surface is 2-bit color indexed.

DDPF_PALETTEINDEXED4 The surface is 4-bit color indexed.

DDPF_PALETTEINDEXED8 The surface is 8-bit color indexed. Most common.

DDPF_PALETTEINDEXEDTO8 The surface is 1-, 2-, or 4-bit color indexed to an 8-bit palette.

DDPF_RGB The RGB data in the pixel format structure is valid.

DDPF_ZBUFFER The pixel format describes a z-buffer surface.

DDPF_ZPIXELS The surface contains z information in the pixels.

Note that there are a lot more flags especially for D3D-related properties. Please refer to the DirectX SDK for more information.

The fields that matter the most right now are

DDPF_PALETTEINDEXED8— This indicates that the surface is an 8-bit palettized mode.

DDPF_RGB— This indicates that the surface is an RGB mode and the format can be queried by testing the value in dwRGBBitCount.

So all you need to do is write a test that looks something like this:

DDPIXELFORMAT ddpixel; // used to hold info

LPDIRECTDRAWSURFACE7 lpdds_primary; // assume this is valid

// clear our structure
memset(&ddpixel, 0, sizeof(ddpixel));

// set length
ddpixel.dwSize = sizeof(ddpixel);

// make call off surface (assume primary this time)
lpdds_primary->GetPixelFormat(&ddpixel);

// now perform tests
// check if this is an RGB mode or palettized
if (ddpixel.dwFlags & DDPF_RGB)
    {
    // RGB mode
    // what's the RGB mode
    switch(ddpixel.dwRGBBitCount)
          {
          case 15: // must be 5.5.5 mode
               {
               // use the _RGB16BIT555(r,g,b) macro
               } break;

          case 16: // must be 5.6.5 mode
               {
               // use the _RGB16BIT565(r,g,b) macro
               } break;

          case 24: // must be 8.8.8 mode
               {
               } break;

          case 32: // must be alpha(8).8.8.8 mode
               {
               } break;

          default: break;

          } // end switch

    } // end if
else
if (ddpixel.dwFlags & DDPF_PALETTEINDEXED8)
    {
    // 256 color palettized mode
    } // end if
else
    {
    // something else??? more tests
    } // end else

Fairly simple code, huh? A bit ugly granted, but that comes with the territory, baby! The real power of GetPixelFormat() comes into play when you don't set the video mode and you simply create a primary surface in a windowed mode. In that case, you'll have no idea about the properties of the video system and you must query the system. Otherwise, you won't know the color depth, pixel format, or even the resolution of the system.

Now that you're a 16-bit expert, here's a demo! There's nothing to creating a 16-bit application—just make the call to SetDisplayMode() with 16 bits for the color depth, and that's it. As an example, here are the steps you would take to create a full-screen, 16-bit color mode in DirectDraw:

LPDIRECTDRAW7  lpdd      = NULL; // used to get directdraw7
DDSURFACEDESC2 ddsd;             // surface description
LPDIRECTDRAWSURFACE7 lpddsprimary = NULL; // primary surface
// create IDirectDraw7and test for error
if (FAILED(DirectDrawCreateEx(NULL, (void **)&lpdd, IID_IDirectDraw7, NULL)))
   return(0);


// set cooperation level to requested mode
if (FAILED(lpdd->SetCooperativeLevel(main_window_handle,
           DDSCL_ALLOWMODEX | DDSCL_FULLSCREEN |
           DDSCL_EXCLUSIVE | DDSCL_ALLOWREBOOT)))
    return(0);

// set the display mode to 16 bit color mode
if (FAILED(lpdd->SetDisplayMode(640,480,16,0,0)))
   return(0);

// Create the primary surface
memset(&ddsd,0,sizeof(ddsd));
ddsd.dwSize = sizeof(ddsd);
ddsd.dwFlags = DDSD_CAPS;

// set caps for primary surface
ddsd.ddsCaps.dwCaps = DDSCAPS_PRIMARYSURFACE;

// create the primary surface
lpdd->CreateSurface(&ddsd,&lpddsprimary,NULL);

And that's all there is to it. At this point, you would see a black screen (possibly garbage if the primary buffer memory has data in it).

To simplify the discussion, assume that you already tested the pixel format and found that it's RGB 16-bit 5.6.5 mode—which is correct, because you set the mode! In the worst-case scenario, however, it could have been the 5.5.5 format. Anyway, to write a pixel to the screen, you must

Lock the surface. In this example, that means locking the primary surface with a call to Lock().
Build the RGB WORD for 16-bit mode. This entails using one of the macros or doing it yourself. Basically, you're going to send the pixel-plotting function red, green, and blue values. They must be scaled and then combined into the 16-bit 5.6.5 format that the primary surface needs.
Write the pixel. This means addressing the primary buffer using a USHORT pointer and writing the pixel into the VRAM buffer.
Unlock the primary surface. A call to Unlock() is made.

Here's the code for a rough 16-bit plot pixel function:

void Plot_Pixel16(int x, int y, int red, int green, int blue,
                  LPDIRECTDRAWSURFACE7 lpdds)
{
// this function plots a pixel in 16-bit color mode
// very inefficient...

DDSURFACEDESC2 ddsd; // directdraw surface description

// first build up color WORD
USHORT pixel = __RGB16BIT565(red,green,blue);

// now lock video buffer
DDRAW_INIT_STRUCT(ddsd);

lpdds->Lock(NULL,&ddsd,DDLOCK_WAIT |
            DDLOCK_SURFACEMEMORYPTR,NULL);

// write the pixel

// alias the surface memory pointer to a USHORT ptr
USHORT *video_buffer = ddsd.lpSurface;

// write the data
video_buffer[x + y*(ddsd.lPitch >> 1)] = pixel;

// unlock the surface
lpdds->Unlock(NULL);

} // end Plot_Pixel16

Notice the use of DDRAW_INIT_STRUCT(ddsd), which is a simple macro that zeros out the structure and sets its dwSize field. I'm getting tired of doing it the long way. Here's the macro definition:

// this macro should be on one line
#define DDRAW_INIT_STRUCT(ddstruct)
{ memset(&ddstruct,0,sizeof(ddstruct));
  ddstruct.dwSize=sizeof(ddstruct); }

For example, to plot a pixel on the primary surface at (10,30) with RGB values (255,0,0), you would do something like this:

Plot_Pixel16(10,30,   // x,y
             255,0,0, // rgb
             lpddsprimary); // surface to draw on

Although the function seems reasonably simple, it's extremely inefficient. There are a number of optimizations that you can take advantage of. The first problem is that the function locks and unlocks the sent surface each time. This is totally unacceptable. Locking/unlocking can take hundreds of microseconds on some video cards, and maybe even longer. The bottom line is that in a game loop, you should lock a surface once, do all the manipulation you're going to do with it, and unlock it when you're done, as shown in Figure 7.3. That way you don't have to keep locking/unlocking, zeroing out memory, etc. For example, the memory fill of the DDSURFACEDESC2 structure probably takes longer than the pixel plot! Not to mention that the function isn't inline and the function overhead is probably killing you.

Figure 7.3. DirectDraw surfaces should be locked as little as possible.

graphics/07fig03.gif

These are the types of things that a game programmer needs to keep in mind. You aren't writing a word processor program here—you need speed! Here's another version of the function with a little bit of optimization, but it can still be 10 times faster:

inline void Plot_Pixel_Fast16(int x, int y,
                              int red, int green, int blue,
                              USHORT *video_buffer, int lpitch)
{
// this function plots a pixel in 16-bit color mode
// assuming that the caller already locked the surface
// and is sending a pointer and byte pitch to it

// first build up color WORD
USHORT pixel = __RGB16BIT565(red,green,blue);

// write the data
video_buffer[x + y*(lpitch >> 1)] = pixel;

} // end Plot_Pixel_Fast16

I still don't like the multiply and shift, but this new version isn't bad. You can get rid of both the multiply and shift with a couple of tricks. First, the shift is needed because lPitch is memory width in bytes. However, because you're assuming that the caller already locked the surface and queried the memory pointer and pitch from the surface, it's a no-brainer to add one more step to the process to compute a WORD or 16-bit strided version of lpitch, like this:

int lpitch16 = (lpitch >> 1);

Basically, lpitch16 is now the number of 16-bit WORDs that make up a video line. With this new value, you can rewrite the functions once again, like this:

inline void Plot_Pixel_Faster16(int x, int y,
                                int red, int green, int blue,
                                USHORT *video_buffer, int lpitch16)
{
// this function plots a pixel in 16-bit color mode
// assuming that the caller already locked the surface
// and is sending a pointer and byte pitch to it

// first build up color WORD
USHORT pixel = _RGB16BIT565(red,green,blue);

// write the data
video_buffer[x + y*lpitch16] = pixel;

} // end Plot_Pixel_Faster16

That's getting there! The function is inline and has a single multiply, addition, and memory access. Not bad, but it could be better! The final optimization is to use a huge lookup table to get rid of the multiply, but this may not be needed because integer multiplies are getting down to single cycles on newer Pentium X architectures. It is a way to speed things up, however.

On the other hand, you can get rid of the multiply by using a number of shift-adds. For example, assuming a perfectly linear memory mode (without any extra stride per line), you know that it's exactly 1,280 bytes from one video line to another in a 640x480 16-bit mode. Therefore, you need to multiply y by 640 because the array access will use automatic pointer arithmetic and scale anything in the [] array operator by a factor of 2 (2 bytes per USHORT WORD). Anyway, here's the math:

y*640 = y*512 + y*128

512 is equal to 2⁹, and 128 is equal to 2⁷. Therefore, if you were to shift y to the left 9 times and then add that to y shifted to the left 7 times, the result should be equivalent to y*640, or mathematically:

y*640 = y*512 + y*128
      = (y << 9) + (y << 7)

That's it! If you aren't familiar with this trick, take a look at Figure 7.4. Basically, shifting any binary-encoded number to the right is the same as dividing by 2 and shifting to the left is the same as multiplying by 2. Furthermore, multiple shifts accumulate. Hence, you can use this property to perform very fast multiplication on numbers that are powers of 2. However, if the numbers aren't powers of 2, you can always break them into a sum of products that are—as in the previous case. Now, optimizations like these aren't really important on Pentium II+ processors since they can usually multiply in a single clock, but on older processors or other platforms like the Game Boy Advance, etc. knowing tricks always come in handy.

Figure 7.4. Using binary shifting to multiply and divide.

graphics/07fig04.gif

NOTE

You'll see a lot more of these tricks when you get to the Chapter 11, "Algorithms, Data Structures, Memory Management, and Multithreading."

For an example of using the 16-bit modes to write pixels to the screen, take a look at DEMO7_1.CPP|EXE on the CD. The program basically implements what you've done here and blasts random pixels to the screen. Take a look at the code and note that you don't need a palette anymore, which is kind of nice <BG>. By the way, the code is in the standard T3D Game Engine template, so the only things you need to really look at are Game_Init() and Game_Main(). The contents of Game_Main() are shown here:

int Game_Main(void *parms = NULL, int num_parms = 0)
{
// this is the main loop of the game, do all your processing
// here

// for now test if user is hitting ESC and send WM_CLOSE
if (KEYDOWN(VK_ESCAPE))
   SendMessage(main_window_handle,WM_CLOSE,0,0);
// plot 1000 random pixels to the primary surface and return
// clear ddsd and set size, never assume it's clean
DDRAW_INIT_STRUCT(ddsd);

// lock the primary surface
if (FAILED(lpddsprimary->Lock(NULL, &ddsd,
                   DDLOCK_SURFACEMEMORYPTR | DDLOCK_WAIT,
                   NULL)))
   return(0);

// now ddsd.lPitch is valid and so is ddsd.lpSurface

// make a couple aliases to make code cleaner, so we don't
// have to cast
int lpitch16 = (int)(ddsd.lPitch >> 1);
USHORT *video_buffer = (USHORT *)ddsd.lpSurface;

// plot 1000 random pixels with random colors on the
// primary surface, they will be instantly visible
for (int index=0; index < 1000; index++)
    {
    // select random position and color for 640x480x16
    int red   = rand()%256;
    int green = rand()%256;
    int blue  = rand()%256;
    int x = rand()%640;
    int y = rand()%480;

    // plot the pixel
    Plot_Pixel_Faster16(x,y,red,green,blue,video_buffer,lpitch16);

    } // end for index

// now unlock the primary surface
if (FAILED(lpddsprimary->Unlock(NULL)))
   return(0);

// return success or failure or your own return code here
return(1);

} // end Game_Main

24/32-Bit High-Color Mode

Once you've mastered 16-bit mode, 24-bit and 32-bit modes are trivial. I'll begin with 24-bit mode because it's simpler than 32-bit mode—which is not a surprise! 24-bit mode uses exactly one byte per channel of RGB blue. Thus, there's no loss and a total of 256 shades per channel, giving a total possible number of colors of 256x256x256 = 16.7 million. The bits for red, green, and blue are encoded just as they were in 16-bit mode, except that you don't have to worry about one channel using more bits than another.

Because there's one byte per channel and three channels, there are three bytes per pixel. This makes for really ugly addressing, as shown in Figure 7.5. Alas, writing pixels in pure 24-bit mode is rather contrived, as shown in the following 24-bit version of the pixel-writing function:

inline void Plot_Pixel_24(int x, int y,
                          int red, int green, int blue,
                          UCHAR *video_buffer, int lpitch)
{
// this function plots a pixel in 24-bit color mode
// assuming that the caller already locked the surface
// and is sending a pointer and byte pitch to it

// in byte or 8-bit math the proper address is: 3*x + y*lpitch
// this is the address of the low order byte which is the Blue channel
// since the data is in RGB order
DWORD pixel_addr = (x+x+x) + y*lpitch;

// write the data, first blue
video_buffer[pixel_addr]   = blue;

// now red
video_buffer[pixel_addr+1] = green;

// finally green
video_buffer[pixel_addr+2] = red;

} // end Plot_Pixel_24

Figure 7.5. Three-byte RGB addressing is ugly.

graphics/07fig05.gif

WARNING

Many video cards don't support 24-bit color mode. They support only 32-bit color, which is usually 8 bits of alpha transparency and then 24 bits of color. This is due to addressing constraints. So DEMO7_2.EXE may not work on your system.

The function takes as parameters the x,y, along with the RGB color, and finally the video buffer starting address and the memory pitch in bytes. There's no point in sending the memory pitch or the video buffer in some WORD length because there isn't any data type that's three bytes long. Hence, the function basically starts addressing the video buffer at the requested pixel location and then writes the blue, green, and red bits for the pixel. Here's a macro to build an RGB 24-bit word:

// this builds a 24 bit color value in 8.8.8 format
#define _RGB24BIT(r,g,b) ((b) + ((g) << 8) + ((r) << 16) )

For an example of 24-bit mode, take a look at DEMO7_2.CPP|EXE on the CD. It basically mimics the functionality of DEMO7_1.CPP, but in 24-bit mode.

Moving on to 32-bit color, the pixel setup is a little different, as shown in Figure 7.6. In 32-bit mode, the pixel data is arranged in the following two formats:

Alpha(8).8.8.8— This format uses eight bits for alpha or transparency information (or sometimes other information) and then eight bits for each channel: red, green, and blue. However, where simple bitmapping is concerned, you can usually disregard the alpha information and simply write eights to it. The nice thing about this mode is that it's 32 bits per pixel, which is the fastest possible memory addressing mode for a Pentium.

X(8).8.8.— Similar to the preceding mode, except in this mode the upper eight bits of the color WORD are "don't care's" or irrelevant. However, I still suggest setting them to zeroes to be safe. You may say, "This mode seems like a 24-bit mode, so why have it?" The answer is that many video cards can't address on three-byte boundaries, so the fourth byte is just for alignment.

Figure 7.6. 32-bit RGB pixel encodings.

graphics/07fig06.gif

Now, take a look at a macro to create a 32-bit color WORD:

// this builds a 32 bit color value in A.8.8.8 format (8-bit alpha mode)
#define _RGB32BIT(a,r,g,b) ((b) + ((g) << 8) + ((r) << 16) + ((a) << 24))

Then all you need to do is change your pixel-plotting function to use the new macro and take advantage of the four-byte-per-pixel data size. Here it is

inline void Plot_Pixel_32(int x, int y,
                          int alpha,int red, int green, int blue,
                          UINT *video_buffer, int lpitch32)
{
// this function plots a pixel in 32-bit color mode
// assuming that the caller already locked the surface
// and is sending a pointer and DWORD aligned pitch to it

// first build up color WORD
UINT pixel = __RGB32BIT(alpha,red,green,blue);

// write the data
video_buffer[x + y*lpitch32] = pixel;

} // end Plot_Pixel_32

This should look familiar. The only thing hidden is the fact that lpitch32 is the byte pitch divided by four, so it's a DWORD or 32-bit WORD stride. With that all in mind, check out DEMO7_3.CPP|EXE. It's the same pixel-plotting demo, but in 32-bit mode. It should work on your machine because more video cards support 32-bit mode than pure 24-bit mode.

All righty, then! I think I've belabored high-color modes enough that you can work with them and convert any 8-bit color code that you want. Remember, I can't assume that everyone has a Pentium IV 2.0GHz with a GeForce III 3D Accelerator. Sticking to 8-bit color is a good way to get your programs running then you can move to 16-bit or higher modes.

→

Ajax Editor JavaScript Editor