Advanced Issues
Given the many different concepts involved in reinforcement learning, it's not surprising that there are many intricacies present in most algorithms.
Approximators
So far, we've assumed that the value estimates—both for states and actions—are stored in an array. This representation has the advantage of allowing most algorithms to converge to the perfect policy. However, a significant amount of memory is necessary, making the approach unsuitable for large problems.
The matrix can be considered as a function estimating the value of states or actions. As such, it's possible to use function approximators to compute a similar result. Perceptrons and decision trees are commonly used for this purpose.
The most effective solution to the problem is to learn the exact estimates using an array, and then to try to approximate it. This would be best suited as an offline computation, because it requires two phases of preprocessing. The advantage is that all the correct data is immediately available, allowing the approximator to achieve the best possible results. The approximator can then be used online as a compact representation of the huge matrix.
Alternatively, the approximator can be the base representation used for learning. Ideally, the technique should be suited to incremental learning—like perceptrons. This approach is slightly trickier and requires more thought. The major problem is that the proof of convergence of most algorithms goes out the window. So obtaining the right results becomes a rather more empirical process.
Hierarchies
Hierarchical solutions have also received a lot of attention recently. One benefit is to divide the problem into levels of problems. The smaller problems are solved first, and then more complex problems are solved using these building blocks. Like hierarchical finitestate machines, the idea is to abstract out common problems and reuse them in a modular fashion [Humphrys 97, Dietterich98].
In most cases, experts handcraft the hierarchy. So it's the engineer who is responsible for simplifying the state/action space of each component, and assembling them together in a convenient fashion. Then, standard reinforcement learning techniques can be used to learn the state/action mapping.
