Varieties of Learning Components
Both batch and incremental techniques are methods for processing (sensory) data. This section discusses particular ways to extract the essential information and find the right solution.
Supervised learning uses examples of mappings from inputs X to outputs Y, where Y is the desired output. This can be expressed formally as the relation X Y. This approach is extremely explicit; it's clear what the answer should be given an input. This captures the essence of reactive techniques.
An example of supervised learning would be to build a decision tree based on a collection of examples gathered from expert play. For supervised learning to work well, it's best to gather lots of data. This is important for learning generally; it gives the AI system the opportunity to find the best solution.
Supervised learning is very efficient, because it can be handled directly by reactive components. It's literally about adjusting a direct mapping, which is arguably the fastest form of learning. The quality of the results is also very good, because most AI techniques have been fine-tuned over the past decades of research.
Reinforcement learning problems are less explicit. They require the animat to try an action X in a certain situation Y, and for the animat to receive a reward according to the outcome. This is formally modeled as a relation X x Y R. As usual, R is the set of real numbers, to which the reward value belongs.
Taking another example from reactive movement, the animat would be given a positive reward signal for moving forward, and a negative one for bumping into obstacles.
This problem is difficult for two reasons. First, it's difficult to find the best behavior without a lot of experimentation. Second, complex problems have complex reward signals that often must be interpreted and distributed into subcomponents (that is, modular rewards rather than holistic).
As far as using this approach in practice is concerned, it's important to realize the amount of time it takes to get to the right result. This approach requires many exploratory actions that may be particularly unrealistic. So either:
The evolutionary approach is based on a fitness function. This is conceptually similar to reinforcement problems, because a scalar value is used to provide evaluative feedback of the behavior quality. However, the difference in the evolutionary approach is that the fitness is generally computed on a per-trial basis, not for every action. Conceptually, this means n pairs of inputs and outputs correspond to the fitness; formally speaking, it's the relationship (X x Y)n R.
For example, we could evolve the entire shooting behaviors using a set of trials on moving targets, and use genetic algorithms to mate the best shooters together.
Evolutionary approaches are extremely impractical in anything but offline learning, because lengthy trials are required to set up the animats. These phases are particularly unrealistic to watch, albeit entertaining! That said, the results produced are generally extremely efficient.
Finally, there are specific unsupervised techniques—such as self-organizing feature maps—that are not covered directly in this book. The principle is to feed lots of data to the component, and let it automatically organize an internal representation that extracts patterns in the data. After this has been done, the internal representation can be analyzed for pattern details. Naturally, independent AI techniques are needed to interpret these patterns, so purely unsupervised techniques are not useful alone.
However, it's possible, for example, to combine reinforcement learning technique with an internal evaluation mechanism (for instance, a reward script). As a compound component, these approaches can be considered unsupervised (and are covered in Part VII).
As a general rule, supervised learning is the way to go. It's a safe bet because of the predictable results, and is undoubtedly the most efficient form of learning. This makes it ideally suited to game development.
Other techniques provide less control over the final behavior. Although there are many ways to get them to work in practice (as depicted in this book's examples and exercises), this generally incurs an overhead in both development and computation. Also, even though reinforcement and evolutionary components can achieve better behaviors, their learning phase is quite unrealistic (for instance, experimenting with actions or behaviors that are inappropriate).
In brief, supervised components are the best way for game developers to keep control. Components based on feedback may be tempting from a design point of view, but we should be aware that they require longer development times.