Tuning the Rasterizer Stage
Rasterization is where the raw power of the graphics subsystem is shown. However, some clever planning can significantly enhance performance here as well. Keep in mind that we are talking about drawing pixels onscreen. In an ideal world, we would paint each pixel exactly once so zero overdraw would be achieved. Under these circumstances, and assuming the screen is SX pixels wide and SY pixels high, we would be painting exactly SX*SY pixels.
But this is seldom the case. We end up redrawing the same pixels over again, consuming fill rate and making the Z-buffer go mad in the process. To begin with, we need a diagnostic of the situation so we can understand how much overdraw there is in our rendering loop. Therefore, we need to compute a depth-complexity image (see Figure A.4), an image that graphically depicts how much overdraw there is.
To compute such an image, all we have to do is activate an additive blending function, such as glBlend(GL_ONE,GL_ONE), and render all primitives with texturing disabled and colored in plain white. The result will be an image brighter in those areas where overdraw exists. As areas get darker, it means we are rendering with less overdraw. Now, take a look at the picture in Figure A.4 and try to understand what is causing overdraw. Can you rearrange your code in such a way that overdraw is reduced? Turning culling on, for example, eliminates roughly 50 percent of all incoming triangles due to their alignment and will surely reduce overdraw. But what else can be done? Do you need to add specific occlusion-detection code? Maybe your hardware supports occlusion queries, and you can get that up and running in very little time.
Another factor that has an impact on rasterization speed is primitive Z-ordering. The way Z-buffers work, it is more efficient to send primitives front-to-back than the other way around. The reason becomes obvious if you focus for a second on a specific pixel of the screen. If you paint back-to-front, each new primitive is accepted, needs to be textured and lit, and updates the Z-buffer position. This means lots of overdraw. Now, if you start with the closest primitive, you will discard all subsequent primitives early in the pipeline because their Z-value will not make it to the Z-buffer. This means part of the lighting and texturing will be saved, and performance will increase. Sometimes it is not easy to sort all scene triangles front-to-back exactly, or it is too costly. But it is not that hard to do it roughly, achieving a significant increase in the process. If you use a BSP you can traverse it so you are rendering front-to-back. Quadtrees and octrees can also be treated this way. Portal systems can also be rendered front-to-back if you use the recursion level as your sorting mechanism. For example, paint rooms as you propagate calls through them, and you will ensure front-to-back ordering. None of these three approaches will perform total ordering, but making sure most triangles come in the right order will increase performance significantly.
You can also disable the Z-buffer completely for many large objects, thus increasing performance. A skybox, for example, will certainly be behind everything else. Therefore, it is safe to paint it first, with Z-buffer disabled. Keep in mind that large primitives take longer to render, and thus taking the skybox out of the way is a wise move. The same applies to onscreen menus and scoreboards, and generally speaking, all items that will definitely be closest to the player. These items will never be occluded, and thus can be rendered last, with Z-buffer disabled as well. Z-buffer testing and writing is expensive, so try to find ways to switch it off whenever possible. Some algorithms, such as BSPs or some of the portal variants, order triangles internally, so there is no need to have the Z-buffer on. A call like glDepthFunc(GL_ALWAYS) will ensure that Z-values are updated, but no actual testing will take place.