1.9. Coordinate Transforms
The purpose of the OpenGL graphics processing pipeline is to convert threedimensional descriptions of objects into a two-dimensional image that can be displayed. In many ways, this process is similar to using a camera to convert a real-world scene into a two-dimensional print. To accomplish the transformation from three dimensions to two, OpenGL defines several coordinate spaces and transformations between those spaces. Each coordinate space has some properties that make it useful for some part of the rendering process. The transformations defined by OpenGL afford applications a great deal of flexibility in defining the 3D-to-2D mapping. For success at writing shaders in the OpenGL Shading Language, understanding the various transformations and coordinate spaces used by OpenGL is essential.
In computer graphics, MODELING is the process of defining a numerical representation of an object that is to be rendered. For OpenGL, this usually means creating a polygonal representation of an object so that it can be drawn with the polygon primitives built into OpenGL. At a minimum, a polygonal representation of an object needs to include the coordinates of each vertex in each polygon and the connectivity information that defines the polygons. Additional data might include the color of each vertex, the surface normal at each vertex, one or more texture coordinates at each vertex, and so on.
In the past, modeling an object was a painstaking effort, requiring precise physical measurement and data entry. (This is one of the reasons the Utah teapot, modeled by Martin Newell in 1975, has been used in so many graphics images. It is an interesting object, and the numerical data is freely available. Several of the shaders presented in this book are illustrated with this object; see, for example, Color Plate 24.) More recently, a variety of modeling tools have become available, both hardware and software, and this has made it relatively easy to create numerical representations of threedimensional objects that are to be rendered.
Three-dimensional object attributes, such as vertex positions and surface normals, are defined in OBJECT SPACE. This coordinate space is one that is convenient for describing the object that is being modeled. Coordinates are specified in units that are convenient to that particular object. Microscopic objects may be modeled in units of angstroms, everyday objects may be modeled in inches or centimeters, buildings might be modeled in feet or meters, planets could be modeled in miles or kilometers, and galaxies might be modeled in light years or parsecs. The origin of this coordinate system (i.e., the point (0, 0, 0)) is also something that is convenient for the object being modeled. For some objects, the origin might be placed at one corner of the object's three-dimensional bounding box. For other objects, it might be more convenient to define the origin at the centroid of the object. Because of its intimate connection with the task of modeling, this coordinate space is also often referred to as MODEL SPACE or the MODELING COORDINATE SYSTEM. Coordinates are referred to equivalently as object coordinates or modeling coordinates.
To compose a scene that contains a variety of three-dimensional objects, each of which might be defined in its own unique object space, we need a common coordinate system. This common coordinate system is called WORLD SPACE or the WORLD COORDINATE SYSTEM, and it provides a common frame of reference for all objects in the scene. Once all the objects in the scene are transformed into a single coordinate system, the spatial relationships between all the objects, the light sources, and the viewer are known. The units of this coordinate system are chosen in a way that is convenient for describing a scene. You might choose feet or meters if you are composing a scene that represents one of the rooms in your house, but you might choose city blocks as your units if you are composing a scene that represents a city skyline. The choice for the origin of this coordinate system is also arbitrary. You might define a three-dimensional bounding box for your scene and set the origin at the corner of the bounding box such that all of the other coordinates of the bounding box have positive values. Or you may want to pick an important point in your scene (the corner of a building, the location of a key character, etc.) and make that the origin.
After world space is defined, all the objects in the scene must be transformed from their own unique object coordinates into world coordinates. The transformation that takes coordinates from object space to world space is called the MODELING TRANSFORMATION. If the object's modeling coordinates are in feet but the world coordinate system is defined in terms of inches, the object coordinates must be scaled by a factor of 12 to produce world coordinates. If the object is defined to be facing forward but in the scene it needs to be facing backwards, a rotation must be applied to the object coordinates. A translation is also typically required to position the object at its desired location in world coordinates. All of these individual transformations can be put together into a single matrix, the MODEL TRANSFORMATION MATRIX, that represents the transformation from object coordinates to world coordinates.
After the scene has been composed, the viewing parameters must be specified. One aspect of the view is the vantage point (i.e., the eye or camera position) from which the scene will be viewed. Viewing parameters also include the focus point (also called the lookat point or the direction in which the camera is pointed) and the up direction (e.g., the camera may be held sideways or upside down).
The viewing parameters collectively define the VIEWING TRANSFORMATION, and they can be combined into a matrix called the VIEWING MATRIX. A coordinate multiplied by this matrix is transformed from world space into EYE SPACE, also called the EYE COORDINATE SYSTEM. By definition, the origin of this coordinate system is at the viewing (or eye) position. Coordinates in this space are called eye coordinates. The spatial relationships in the scene remain unchanged, but orienting the coordinate system in this way makes it easy to determine the distance from the viewpoint to various objects in the scene.
Although some 3D graphics APIs allow applications to separately specify the modeling matrix and the viewing matrix, OpenGL combines them into a single matrix called the MODELVIEW MATRIX. This matrix is defined to transform coordinates from object space into eye space (see Figure 1.2).
Figure 1.2. Coordinate spaces and transforms in OpenGL
You can manipulate a number of matrices in OpenGL. Call the glMatrixMode function to select the modelview matrix or one of OpenGL's other matrices. Load the current matrix with the identity matrix by calling glLoadIdentity, or replace it with an arbitrary matrix by calling glLoadMatrix. Be sure you know what you're doing if you specify an arbitrary matrixthe transformation might give you a completely incomprehensible image! You can also multiply the current matrix by an arbitrary matrix by calling glMultMatrix.
Applications often start by setting the current modelview matrix to the view matrix and then add on the necessary modeling matrices. You can set the modelview matrix to a reasonable viewing transformation with the gluLookAt function. (This function is not part of OpenGL proper but is part of the OpenGL utility library that is provided with every OpenGL implementation.) OpenGL actually supports a stack of modelview matrices, and you can duplicate the topmost matrix and copy it onto the top of the stack with glPushMatrix. When this is done, you can concatenate other transformations to the topmost matrix with the functions glScale, glTranslate, and glRotate to define the modeling transformation for a particular threedimensional object in the scene. Then, pop this topmost matrix off the stack with glPopMatrix to get back to the original view transformation matrix. Repeat the process for each object in the scene.
At the time light source positions are specified with the glLight function, they are transformed by the current modelview matrix. Therefore, light positions are stored within OpenGL as eye coordinates. You must set up the modelview matrix to perform the proper transformation before light positions are specified or you won't get the lighting effects that you expect. The lighting calculations that occur in OpenGL are defined to happen on a per-vertex basis in the eye coordinate system. For the necessary reflection computations, light positions and surface normals must be in the same coordinate system. OpenGL implementations often choose to do lighting calculations in eye space; therefore, the incoming surface normals have to be transformed into eye space as well. You accomplish this by transforming surface normals by the inverse transpose of the upper leftmost 3 x 3 matrix taken from the modelview matrix. At that point, you can apply the pervertex lighting formulas defined by OpenGL to determine the lit color at each vertex.
After coordinates have been transformed into eye space, the next thing is to define a viewing volume. This is the region of the three-dimensional scene that is visible in the final image. The transformation that takes the objects in the viewing volume into CLIP SPACE (also known as the CLIPPING COORDINATE SYSTEM, a coordinate space that is suitable for clipping) is called the PROJECTION TRANSFORMATION. In OpenGL, you establish the projection transformation by calling glMatrixMode to select the projection matrix and then setting this matrix appropriately. Parameters that may go into creating an appropriate projection matrix are the field of view (how much of the scene is visible), the aspect ratio (the horizontal field of view may differ from the vertical field of view), and near and far clipping planes to eliminate things that are too far away or too close (for perspective projections, weirdness will occur if you try to draw things that are at or behind the viewing position). Three utility functions set the projection matrix: glOrtho, glFrustum, and gluPerspective. The difference between these functions is that glOrtho defines a parallel projection (i.e., parallel lines in the scene are projected to parallel lines in the final two-dimensional image), whereas glFrustum and gluPerspective define perspective projections (i.e., parallel lines in the scene are foreshortened to produce a vanishing point in the image, such as railroad tracks converging to a point in the distance).
FRUSTUM CLIPPING is the process of eliminating any graphics primitives that lie outside an axis-aligned cube in clip space. This cube is defined such that the x, y, and z components of the clip space coordinate are less than or equal to the w component for the coordinate, and greater than or equal to -w (i.e., -w x w, -w y w, and -w z w). Graphics primitives (or portions thereof) that lie outside this cube are discarded. Frustum clipping is always performed on all incoming primitives in OpenGL. USER CLIPPING, on the other hand, is a feature that can be enabled or disabled by the application. Applications can call glClipPlane to specify one or more clipping planes that further restrict the size of the viewing volume, and each clipping plane can be individually enabled with glEnable. At the time user clipping planes are specified, OpenGL transforms them into eye space using the inverse of the current modelview matrix. Each plane specified in this manner defines a half-space, and only the portions of primitives that lie within the intersection of the view volume and all of the enabled half-spaces defined by user clipping planes are drawn.
The next step in the transformation of vertex positions is the perspective divide. This operation divides each component of the clip space coordinate by the homogeneous coordinate w. The resulting x, y, and z components range from [-1,1], and the resulting w coordinate is always 1, so it is no longer needed. In other words, all the visible graphics primitives are transformed into a cubic region between the point (-1, -1, -1) and the point (1, 1, 1). This is the NORMALIZED DEVICE COORDINATE SPACE, which is an intermediate space that allows the viewing area to be properly mapped onto a viewport of arbitrary size and depth.
Pixels within a window on the display device aren't referred to with floating-point coordinates from -1 to 1; they are usually referred to with coordinates defined in the WINDOW COORDINATE SYSTEM, where x values range from 0 to the width of the window minus 1, and y values range from 0 to the height of the window minus 1. Therefore, one more transformation step is required. The VIEWPORT TRANSFORMATION specifies the mapping from normalized device coordinates into window coordinates. You specify this mapping by calling the OpenGL functions glViewport, which specifies the mapping of the x and y coordinates, and glDepthRange, which specifies the mapping of the z coordinate. Graphics primitives are rasterized in the window coordinate system.