JavaScript EditorFree JavaScript Editor     Ajax Editor 

Main Page
  Previous Section Next Section


Reinforcement learning has benefits and pitfalls in the context of game AI development.


Reinforcement learning is based on an extremely simple concept: the reward signal. It usually requires very little effort to express the problem as reward and punishment. Then, it's just a case of modeling the state and actions, and the reinforcement learning can start (easily achieved in computer games). This makes RL an extremely flexible technique.

Given a representation that does not approximate the estimates, most RL algorithms will provably converge to the ideal policy. Naturally, the simulation will require some time, and good exploration of all the state and actions is needed, but the quality of the results still converge asymptotically.

There are different varieties of reinforcement learning algorithms. They can deal without world models if necessary, learning the transition probabilities or expected rewards. However, the algorithms can also make the most of the world model when it is available.


The naive approach of using a matrix to store the action values scales poorly in terms of memory consumption. Algorithms using a backup depth of one also scale badly in terms of computation. The very nature of "backup" means that numerous forward iterations are required for the reward to eventually reach all the states. This makes it particularly unsuitable for dynamic environments.

The use of approximators is a viable solution in many cases, but the proof of convergence no longer applies. This can lead to unpredictable behaviors and unexpected results, which require much experimentation to resolve.

The reward/punishment paradigm can be surprisingly difficult to work with. For example, expressing humanlike behavior is not something that can be done easily with positive or negative rewards.

      Previous Section Next Section

    JavaScript EditorAjax Editor     JavaScript Editor