Complexity of the Solution
The solution to a problem can be considered as raw data describing the mapping from the input domain to the output codomain. This raw data enables us to solve the problem by using it as a lookup to find the correct response. This approach is not particularly elegant or convenient, but it reveals the true nature of the solution.
Before we even start discussing more sophisticated ways to express a solution, we can use information theory to understand how complex the solution needs to be. In Chapter 21, we already managed to get a feel for the complexity of the problem by analyzing it from an external point of view. This section considers it from the inside.
Information Theory and AI
Beyond the raw data that expresses the solution, we're interested in the information stored within. Information is the essence of the data, the parts that cannot be predicted. Removing the redundant data reveals the fundamental aspect of the solution itself—which we're interested in.
In this context, the term "information" has its meaning from information theory. Information theory is a branch of statistics and probability theory dealing with the study of data, ways to manipulate it (for instance, cryptography and compression) and communicate it (for instance, data transmissions and communication systems) [Wikipedia03a].
This information can be obtained by compressing the raw data in various ways (that is, expressing it in a more compact form). Again, we use the term "compression" in its generic meaning—not limited to encoding bitstreams in small chunks. Such compressing takes into account any kind of pattern in the data, and not just statistical relationships: Symmetry involving arbitrary input or output configurations, and different levels of covariance between variables (as introduced in Chapter 21).
Taking a look at the solution from the point of view of information theory enables us to understand the complexity of the input/output mapping itself. For example, a solution that maps all the inputs onto one single output configuration can be highly compressed, potentially stored as one output element only (for instance, the move forward behavior). This reveals that the AI technique does not need to be complex at all. In a similar fashion, if the solution is extremely flexible (that is, many legal configurations), it's not important which output is chosen. This extra flexibility can also be highly compressed into a concise solution. (For instance, obstacle avoidance does not require any specific action in open space.)
On the other hand, if the mapping from input to output is seemingly unpredictable, and without any flexibility in the solution, the compression ratio will be one. In this case, the AI technique needs to store a lot of complex information to solve the problem, almost reaching the theoretical size of the problem space. This level of complexity is quite rare in practice; even a difficult task such as playing a deathmatch game has many patterns (which we extracted by design). That said, larger problems are more expensive to compress, because patterns are tough to find.
In a nutshell, raw data can describe the solution. The more patterns there are in this data, the better the compression of the input/output mapping. The better the compression, the less information is needed to describe the solution, which in turn simplifies the role of the AI technique.
Not only does information theory enable us to estimate the complexity of a problem, it also provides us with a metric for identifying the best representation. The most compressed form of a data set is known as the minimum description length (MDL). There has been a lot of research on the subject over the past decades, combining the experience from many fields of science.
Essentially, the description length is measured by the number of bits needed to encode the patterns in the data, together with the bits required to express the divergences from these patterns. Generally, the patterns are encoded within the solution, and the exceptions to these patterns can be measured as the necessary adjustment to the output.
Ideally, we would like to find the best possible solution. It turns out Occam's razor, a principle for keeping everything as simple as possible, is an extremely useful bias in AI. Finding the MDL enables us to identify the simplest solution; the one that requires least information to express the entire data set as a collection of bits.
The MDL is a principle for trading off insignificant information in the data for the sake of finding better patterns in the data—and hence a better solution. Using the MDL, we can first identify the most suitable representation, as well as find its best instance.
Theory and Practice
It's important to understand the concepts of information, complexity in the data, and trading off generic-ness with precision. Entire algorithms have been developed around these principles, notably decision trees (covered in Chapter 26, "Classification and Regression Trees").
That said, when it comes to design issues and choices about representation, these insights are mainly useful as wisdom for the AI developer. Understanding the nature of the solution, different ways to express the mapping from input to output, and how to compare two different approaches enables us to design better systems. In the case of weapon selection, for example, it's possible to determine whether neural networks are more appropriate at predicting the weapon fitness than decision trees. By counting the amount of memory used by the representation and the number of bits to express the error of each sample, the developer can compare the lengths of the two descriptions and pick the minimal (neural network or decision tree).
Although such formal methods help, knowing which representation to use can be a matter of experience. This book has already covered AI techniques such as neural networks, rule-based systems, and decision trees, mentioning which types of solutions they provide best. Each provides minimal descriptions for certain problems. A summary of the elusive problem of choosing an AI technique is provided in Part VIII, after each technique has been explained. Until then, this lesson about the theoretical complexity of a mapping is a great guideline for understanding the difficultly of finding a solution.