8.2. Performance Considerations
After you have done all the right things from a software engineering standpoint, your shader may or may not have acceptable performance. Here are some ideas for eking out better performance from your carefully crafted shader.
8.2.1. Consider Computational Frequency
Shading computations can occur in three areas: on the CPU, on the vertex processor, and on the fragment processor. It is tempting to put most of your shader computation in the fragment shader because this is executed for every pixel that is drawn, and you will, therefore, get the highest-quality image. But if performance is a concern, you may be able to identify computations that can be done with acceptable quality per vertex instead of per fragment. By moving the computation to the vertex shader, you can make your fragment shader faster. In some cases, there may be no visible difference between doing the computation in the vertex shader versus doing it in the fragment shader. This might be the case with fog computations, for example.
One way to think about the problem is to implement rapidly changing characteristics in the fragment shader and to implement characteristics that don't change as rapidly in the vertex shader. For instance, diffuse lighting effects change slowly over a surface and so can usually be computed with sufficient quality in the vertex shader. Specular lighting effects might need to be implemented in the fragment shader to achieve high quality. If a particular value changes linearly across an object's surface, you can get the same result by computing the value per vertex and using a varying variable to interpolate it as you would by computing the value at each fragment. In this case, you may as well have the vertex shader do the computation. Unless you are rendering very small triangles, your fragment shader will execute far more times than your vertex shader will, so it is more efficient to do the computation in the vertex shader.
Similarly, you may be able to find computations that can be done once on the CPU and remain constant for a great many executions of your vertex shader or fragment shader. You can often save shader instruction space or improve shader performance (or both) by precomputing values in your application code and passing them to your shader as uniform variables. Sometimes you can spot these things by analyzing your shader code. If you pass length in as a uniform variable and your shader always computes sqrt(length), you're better off doing the computation once on the host CPU and passing that value to your shader rather than computing the value for every execution of your shader. If your shader needs both length and sqrt(length), you can pass both values in as uniform variables.
Deciding where to perform computation also involves knowing where the computational bottleneck occurs for a particular rendering operation. You just need to speed up the slowest part of the system to see an improvement in performance. Conversely, you shouldn't spend time improving the performance of something that isn't a bottleneck, because you won't see the gain in performance anyway.
8.2.2. Analyze Your Algorithm
You can often make your shader more efficient just by understanding the math it uses. For instance, you might want to limit the range of the variable finalcolor to [0,1]. But if you know that you are only adding values to compute this variable and the values that you're adding are always positive, there's really no need to check the result against 0. An instruction like min(finalcolor, 1.0) clamps the result at 1.0, and this instruction likely has higher performance than an instruction like clamp(finalcolor, 0.0, 1.0) because it needs only to compare values against one number instead of two. If you define the valid range of all the variables in your shader, you can more easily see the boundary conditions that you need to handle.
8.2.3. Use the Built-in Functions
Whenever possible, use the built-in functions to implement the effect that you're trying to achieve. Built-in functions are intended to be implemented in an optimal way by the graphics hardware vendor. If your shader hand-codes the same effect as a built-in function, there's little chance that it will be faster than the built-in function but a good chance that it will be slower.
8.2.4. Use Vectors
The OpenGL Shading Language lets you express vector computations naturally, and underlying graphics hardware is often built to operate simultaneously on a vector of values. Therefore, you should take advantage of this and use vectors for calculations whenever possible. On the other hand, you shouldn't use vectors that are bigger than the computations require. Such use can waste registers, hardware interpolators (in the case of varying variables), processing bandwidth, or memory bandwidth.
8.2.5. Use Textures to Encode Complex Functions
Because fragment processing is now programmable, textures can be used for a lot more than just image data. You might want to consider storing a complex function in a texture and doing a single lookup rather than a complex computation within the fragment shader. This is illustrated in Chapter 15, in which we encode a noise function as a 3D texture. This approach takes advantage of the specialized high-performance hardware that performs texture access, and it can also take advantage of texture filtering hardware to interpolate between values encoded in the texture.
8.2.6. Review the Information Logs
One of the main ways that an OpenGL implementation can provide feedback to an application developer is through the shader object and program object information logs (see Section 7.6). During shader development, you should review the messages in the information logs for compiler and linker errors, but you should also review them to see if they include any performance or functionality warnings or other descriptive messages. These information logs are one of the primary ways for OpenGL implementations to convey implementation-dependent information about performance, resource limitations, and so on.