Efficient Data Transfer
The data transfer stage can yield impressive improvement ratios and is relatively straightforward to optimize, making it an ideal target for us. Again, we will talk about techniques for geometry and textures separately because each deserves a different solution.
Texture optimization must always start with a serious diagnostic of where the problem comes from. To begin with, all textures should be in GPU memory whenever possible, and texture swapping must be kept to a minimum. Thus, start by computing the memory size of each texture map and end up with a complete texture budget that summarizes how much RAM your textures are consuming. Most likely, they will not fit in video memory; thus, swapping will be slowing your application down.
Then it is time to optimize texture formats, so less space is required for them. Take a look at your target platform and API: Which efficient internal representation formats are supported? Most APIs support many creative encoding formats that will reduce memory footprints. Using palletized maps is just one of the options, but you can also choose to implement texture compression algorithms, such as DXT, which store textures compressed in video memory. Another issue to watch out for is the abuse of alpha channels. A texture map that is to be used for alpha testing (not blending) should have no more than one bit for alpha. An interesting format is RGB5_A1, defined as five bits for R,G,B (providing 32 intensity levels for each component) and one bit for alpha. This format uses 16 bits per pixel, whereas encoding in traditional RGBA mode would take up 32. Some quality might be lost in the process, but 15 bits for color is perfectly appropriate for most textures.
Once the right format is found, it is time to work on texture sizes. The larger the map, the better the quality, but also the more memory it will take and the longer it will take to download to the graphics subsystem. Thus, it is very important to audit each and every texture to make sure it has the smallest possible size. Build a test application and test the same object with two textures, the original and one reduced in size. Consider the added quality versus reduced size equation, and make sure textures are as small as they can be but no smaller. We do not want texture maps to blur, but unnecessary detail will certainly hurt performance.
Sometimes, a large texture can be avoided with two smaller maps and some multipass trickery. In Chapter 18, "Texture Mapping," we discussed how a detail texture could be alpha-blended with the base map to create a large, nontiling texture map. This is very important in game consoles, where memory is a major concern. Using multitextured detail textures is an extremely popular way of offering great texture quality with low memory footprint.
Another issue to watch out for is the use and abuse of mipmapping. Mipmapping allows for better filtering by storing several prefiltered copies of the same texture map, each half the size of the previous one. Then, the one that more closely resembles the size of the onscreen primitive will be used. The good news is that quality is significantly increased. The downside is that a mipmapped texture map takes up double the memory of the original version, so mipmapping must be used with care. It only makes sense in games using large viewing distances, and many times, it can simply be avoided by using detail textures that fade away with distance.
Some additional advice on texture maps is to make sure you consolidate your texture lists. Too often, modeling packages apply the same texture to several objects, creating more than one texture identifier for the same map. If not treated with care, this may result in the same texture map being loaded into memory. This problem can always be avoided by using a preprocessing material list and storing unique identifiers only.
Once all these techniques are in place, you might however have to code a texture caching engine if your maps still do not fit in video memory. This way you can decide which texture maps are stored in video memory, so performance is maximized. Both simple and more involved solutions are usually available. In the simplest form, you will only need to assign priorities to texture maps, so the API's caching engine knows which maps should be prioritized into video memory. Under OpenGL, texture priorities are supported via the glPrioritizeTextures interface, which has the syntax:
void g|PrioritizeTextures( Glsizei n,const Gluint *textures, const Glclampf *priorities)
Using this interface, we pass the number of textures to assign priorities to, an array with their texture identifiers, and a second array with values in the [0..1] range. Higher priority values mean higher probability of being stored in video memory, and a texture with priority set to zero will most likely never be stored in video memory. Similar functionality is available in DirectX using the following interface, which is a member of the IDirect3DResource8 class:
DWORD SetPriority(DWORD PriorityNew);
This method can be applied to regular textures (IDirect3DTexture8), cube maps (IDirect3DCubeTexture8), volume textures (IDirect3DVolumeTexture8), and so on. Again, higher values are used to specify a texture that is to be stored in video memory, as often as possible.
Another, more involved solution is to manually build a texture cache engine from scratch, overriding the one supplied by the API. You can add a code layer to store a fixed amount of texture maps in video memory, and swap them in and out as needed. Both OpenGL and DirectX provide mechanisms to determine whether a given texture map is actually stored in video or system memory. Under OpenGL, the glAreTexturesResident call will take a list of texture identifiers as a parameter and return a list detailing which ones are currently stored in video memory. Under DirectX, the same functionality is available.
Tuning the geometry transfer must also start with an auditing phase: How much does each vertex weigh? Are you using the right format to ensure minimal download time? Colors as floats are often a waste, and unsigned bytes are perfectly valid. Other times, you will be able to precompute the lighting, so you will not pass per-vertex normals at all.
Another, more aggressive option is to send compressed data down the pipeline and expand it internally using a shader. Vertices can sometimes be sent as bytes, thus dividing the memory footprint by four. This approach would require a shader-based quantization/dequantization routine, which can only be implemented under some architectures.
The most powerful way of optimizing the geometry transfer is to work on a GPU-based cache memory, which ensures that geometry is transferred only when needed. Most modern accelerators allow developers to place geometry directly on GPU memory, so it is only transferred once and reused through many rendering loops. Under OpenGL, this is achieved by using Compiled Vertex Arrays or, even better, through proprietary extensions such as Vertex_Array_Range from NVIDIA or Vertex_Array_Object from ATI. Under DirectX, the same behavior can be triggered by working on the properties of the vertex buffer. For static geometry, we will use write-only vertex buffers and fill them only once. This tells the DirectX driver that data will not be updated at a later stage, and thus can be optimized. Similar tricks can be performed on dynamic memory as well, but be careful: Video memory is often slow to write, and thus careful planning is required. As an example, PC video memory is uncached; thus, the best way to get decent performance is to write it sequentially so you can make the most of the write combiners present in modern-day architectures. Therefore, it is common to work on a double-buffered level. One buffer stored in main memory is used to work on the geometry, and once data is ready, it is sent to video memory using block transfers to maximize performance. As usual, know your architecture to ensure that you do the right thing.