2.3. OpenGL Programmable Processors
The introduction of programmable vertex and fragment processors is the biggest change to OpenGL since its inception and is the reason a high-level shading language is needed. In Chapter 1, we discussed the OpenGL pipeline and the fixed functionality that implements vertex processing and fragment processing. With the introduction of programmability, the fixed functionality vertex processing and fixed functionality fragment processing are disabled when an OpenGL Shading Language program is made current (i.e., made part of the current rendering state).
Figure 2.1 shows the OpenGL processing pipeline when the programmable processors are active. In this case, the fixed functionality vertex and fragment processing shown in Figure 1.1 are replaced by programmable vertex and fragment processors as shown in Figure 2.1. All other parts of the OpenGL processing pipeline remain the same.
Figure 2.1. OpenGL logical diagram showing programmable processors for vertex and fragment shaders rather than fixed functionality
This diagram illustrates the stream processing nature of OpenGL made possible by the programmable processors that are defined as part of the OpenGL Shading Language. Data flows from the application to the vertex processor, on to the fragment processor, and ultimately to the frame buffer. The OpenGL Shading Language was carefully designed to allow hardware implementations to perform parallel processing of both vertices and fragments. This gives graphics hardware vendors the opportunity to produce faster graphics hardware with more parallel processors with each new generation of hardware.
2.3.1. Vertex Processor
The VERTEX PROCESSOR is a programmable unit that operates on incoming vertex values and their associated data. The vertex processor usually performs traditional graphics operations such as the following:
Because of its general purpose programmability, this processor can also be used to perform a variety of other computations. Shaders that are intended to run on this processor are called vertex shaders. Vertex shaders can specify a completely general sequence of operations to be applied to each vertex and its associated data. Vertex shaders that perform some of the computations in the preceding list must contain the code for all desired functionality from the preceding list. For instance, it is not possible to have the existing fixed functionality perform the vertex and normal transformation but to have a vertex shader perform a specialized lighting function. The vertex shader must be written to perform all three functions.
The vertex processor does not replace graphics operations that require knowledge of several vertices at a time or that require topological knowledge. OpenGL operations that remain as fixed functionality in between the vertex processor and the fragment processor include perspective divide and viewport mapping, primitive assembly, frustum and user clipping, backface culling, two-sided lighting selection, polygon mode, polygon offset, selection of flat or smooth shading, and depth range.
Figure 2.2 shows the data values that are used as inputs to the vertex processor and the data values that are produced by the vertex processor. Vertex shaders express the algorithm that executes on the vertex processor to produce output values based on the provided input values. Type qualifiers that are defined as part of the OpenGL Shading Language manage the input to the vertex processor and the output from it.
Figure 2.2. Vertex processor inputs and outputs
Variables defined in a vertex shader can be qualified as ATTRIBUTE VARIABLES. These represent values that are frequently passed from the application to the vertex processor. Because this type of variable is used only for data from the application that defines vertices, it is permitted only as part of a vertex shader. Applications can provide attribute values between calls to glBegin and glEnd or with vertex array calls, so they can change as often as every vertex.
There are two types of attribute variables: built in and user defined. Standard attribute variables in OpenGL include things like color, surface normal, texture coordinates, and vertex position. The OpenGL calls glColor, glNormal, glVertex, and so on, and the OpenGL vertex array drawing commands can send standard OpenGL vertex attributes to the vertex processor. When a vertex shader is executing, it can access these data values through built-in attribute variables named gl_Color, gl_Normal, gl_Vertex, and so on.
Because this method restricts vertex attributes to the set that is already defined by OpenGL, a new interface allows applications to pass arbitrary per-vertex data. Within the OpenGL API, generic vertex attributes are defined and referenced by numbers from 0 up to some implementation-dependent maximum value. The command glVertexAttrib sends generic vertex attributes to OpenGL by specifying the index of the generic attribute to be modified and the value for that generic attribute.
Vertex shaders can access these generic vertex attributes through user-defined attribute variables. Another new OpenGL command, glBindAttribLocation, allows an application to tie together the index of a generic vertex attribute and the name with which to associate that attribute in a vertex shader.
UNIFORM VARIABLES pass data values from the application to either the vertex processor or the fragment processor. Uniform variables typically provide values that change relatively infrequently. A shader can be written so that it is parameterized with uniform variables. The application can provide initial values for these uniform variables, and the end user can manipulate them through a graphical user interface to achieve a variety of effects with a single shader. But uniform variables cannot be specified between calls to glBegin and glEnd, so they can change at most once per primitive.
The OpenGL Shading Language supports both built-in and user-defined uniform variables. Vertex shaders and fragment shaders can access current OpenGL state through built-in uniform variables containing the reserved prefix "gl_". Applications can make arbitrary data values available directly to a shader through user-defined uniform variables. glGetUniformLocation obtains the location of a user-defined uniform variable that has been defined as part of a shader. Data can be loaded into this location with another new OpenGL command, glUniform. Variations of this command facilitate loading of floating-point, integer, Boolean, and matrix values, as well as arrays of these.
Another new feature is the capability of vertex processors to read from texture memory. This allows vertex shaders to implement displacement mapping algorithms, among other things. (However, the minimum number of vertex texture image units required by an implementation is 0, so texture-map access from the vertex processor still may not be possible on all implementations that support the OpenGL Shading Language.) For accessing mipmap textures, level of detail can be specified directly in the shader. Existing OpenGL parameters for texture maps define the behavior of the filtering operation, borders, and wrapping.
Conceptually, the vertex processor operates on one vertex at a time (but an implementation may have multiple vertex processors that operate in parallel). The vertex shader is executed once for each vertex passed to OpenGL. The design of the vertex processor is focused on the functionality needed to transform and light a single vertex. Output from the vertex shader is accomplished partly with special output variables. Vertex shaders must compute the homogeneous position of the coordinate in clip space and store the result in the special output variable gl_Position. Values to be used during user clipping and point rasterization can be stored in the special output variables gl_ClipVertex and gl_PointSize.
Variables that define data that is passed from the vertex processor to the fragment processor are called VARYING VARIABLES. Both built-in and user-defined varying variables are supported. They are called varying variables because the values are potentially different at each vertex and perspective-correct interpolation is performed to provide a value at each fragment for use by the fragment shader. Built-in varying variables include those defined for the standard OpenGL color and texture coordinate values. A vertex shader can use a user-defined varying variable to pass along anything that needs to be interpolated: colors, normals (useful for per-fragment lighting computations), texture coordinates, model coordinates, and other arbitrary values.
There is actually no harm (other than a possible loss of performance) in having a vertex shader calculate more varying variables than are needed by the fragment shader. A warning may be generated if the fragment shader consumes fewer varying variables than the vertex shader produces. But you may have good reasons to use a somewhat generic vertex shader with a variety of fragment shaders. The fragment shaders can be written to use a subset of the varying variables produced by the vertex shader. Developers of applications that manage a large number of shaders may find that reducing the costs of shader development and maintenance is more important than squeezing out a tiny bit of additional performance.
The vertex processor output (special output variables and user-defined and built-in varying variables) is sent to subsequent stages of processing that are defined exactly the same as they are for fixed-function processing: primitive assembly, user clipping, frustum clipping, perspective divide, viewport mapping, polygon offset, polygon mode, shade mode, and culling.
2.3.2. Fragment Processor
The FRAGMENT PROCESSOR is a programmable unit that operates on fragment values and their associated data. The fragment processor usually performs traditional graphics operations such as the following:
A wide variety of other computations can be performed on this processor. Shaders that are intended to run on this processor are called fragment shaders. Fragment shaders express the algorithm that executes on the fragment processor and produces output values based on the input values that are provided. A fragment shader cannot change a fragment's x/y position. Fragment shaders that perform some of the computations from the preceding list must perform all desired functionality from the preceding list. For instance, it is not possible to use the existing fixed functionality to compute fog but have a fragment shader perform specialized texture access and texture application. The fragment shader must be written to perform all three functions.
The fragment processor does not replace graphics operations that require knowledge of several fragments at a time. To support parallelism at the fragment-processing level, fragment shaders are written in a way that expresses the computation required for a single fragment, and access to neighboring fragments is not allowed. An implementation may have multiple fragment processors that operate in parallel.
The fragment processor can perform operations on each fragment that is generated by the rasterization of points, lines, polygons, pixel rectangles, and bitmaps. If images are first downloaded into texture memory, the fragment processor can also be used for pixel processing that requires access to a pixel and its neighbors. A rectangle can be drawn with texturing enabled, and the fragment processor can read the image from texture memory and apply it to the rectangle while performing traditional operations such as the following:
The fragment processor does not replace the fixed functionality graphics operations that occur at the back end of the OpenGL pixel processing pipeline such as coverage, pixel ownership test, scissor test, stippling, alpha test, depth test, stencil test, alpha blending, logical operations, dithering, and plane masking.
Figure 2.3 shows the values that provide input to the fragment processor and the data values that are produced by the fragment processor.
Figure 2.3. Fragment processor inputs and outputs
The primary inputs to the fragment processor are the interpolated varying variables (both built in and user defined) that are the results of rasterization. User-defined varying variables must be defined in a fragment shader, and their types must match those defined in the vertex shader.
Values computed by fixed functionality between the vertex processor and the fragment processor are made available through special input variables. The window coordinate position of the fragment is communicated through the special input variable gl_FragCoord. An indicator of whether the fragment was generated by rasterizing a front-facing primitive is communicated through the special input variable gl_FrontFacing.
Just as in the vertex shader, existing OpenGL state is accessible to a fragment shader through built-in uniform variables. All of the OpenGL state that is available through built-in uniform variables is available to both vertex and fragment shaders. This makes it easy to implement traditional vertex operations such as lighting in a fragment shader.
User-defined uniform variables allow the application to pass relatively infrequently changing values to a fragment shader. The same uniform variable can be accessed by both a vertex shader and a fragment shader if both shaders declare the variable using the same data type.
One of the biggest advantages of the fragment processor is that it can access texture memory an arbitrary number of times and combine in arbitrary ways the values that it reads. A fragment shader is free to read multiple values from a single texture or multiple values from multiple textures. The result of one texture access can be used as the basis for performing another texture access (a DEPENDENT TEXTURE READ). There is no inherent limitation on the number of such dependent reads that are possible, so ray-casting algorithms can be implemented in a fragment shader.
The OpenGL parameters for texture maps continue to define the behavior of the filtering operation, borders, wrapping, and texture comparison modes. These operations are applied when a texture is accessed from within a shader. The shader is free to use the resulting value however it chooses. The shader can read multiple values from a texture and perform a custom filtering operation. It can also use a texture to perform a lookup table operation.
The fragment processor defines almost all the capabilities necessary to implement the fixed-function pixel transfer operations defined in OpenGL, including those in the imaging subset. This means that advanced pixel processing is supported with the fragment processor. Lookup table operations can be done with 1D texture accesses, allowing applications to fully control their size and format. Scale and bias operations are easily expressed through the programming language. The color matrix can be accessed through a built-in uniform variable. Convolution and pixel zoom are supported by accessing a texture multiple times to compute the proper result. Histogram and minimum/maximum operations are left to be defined as extensions because these prove to be quite difficult to support at the fragment level with high degrees of parallelism.
For each fragment, the fragment shader may compute color, depth, and arbitrary values (writing these values into the special output variables gl_FragColor, gl_FragDepth, and gl_FragData) or completely discard the fragment. If the fragment is not discarded, the results of the fragment shader are sent on for further processing. The remainder of the OpenGL pipeline remains as defined for fixed-function processing. Fragments are submitted to coverage application, pixel ownership testing, scissor testing, alpha testing, stencil testing, depth testing, blending, dithering, logical operations, and masking before ultimately being written into the frame buffer. The back end of the processing pipeline remains as fixed functionality because it is easy to implement in nonprogrammable hardware. Making these functions programmable is more complex because read/modify/write operations can introduce significant instruction scheduling issues and pipeline stalls. Most of these fixed functionality operations can be disabled, and alternative operations can be performed within a fragment shader if desired (albeit with possibly lower performance).