Computing the Fitness
A minimal amount of code is required outside of the modules. Mainly, the artificial intelligence (AI) must compute the fitness of the behavior at regular intervals and pass the information to the genetic algorithm. Because we're interested in two different behaviors, there are two fitness functions to think about.
Ideally, we don't want to test for rocket jumping directly, instead letting the evolution discover rockets as the best approach to jumping very high. (Double-jumps or jump pads could be other solutions.) Therefore, we want to assign reward for upward movement only. To prevent the animat from getting high fitness for running up stairs, upward movement is only considered when the animat is not touching the floor.
We also want to reward very high jumps more than many smaller jumps. To achieve this, the reward can be the square of the distance traveled upward. Genetic algorithms are particularly notorious for finding loopholes in fitness functions, but all these criteria seem to cover all our bases.
To rate skills at dodging incoming fire, we'll measure the distance of the player to the point of explosion. If the rocket hits the animat full on, the distance—and therefore the fitness—will be near 0. On the other hand, if the animat successfully scampers far away from the collision point, it'll get a high fitness. Because all rockets explode quickly enough, no limit is imposed upon the fitness. (Clamping the distance would remove useful information for the evolution.)
To prevent the animat from getting high fitness for standing still (but far away), the AI monitors the difference in distance. This is like measuring the average velocity away from the point of impact. This will only be measured at the start and end of the sequence (called when a rocket is fired).
Although these hints to the genetic algorithm indicate that distance is important, the AI also needs to measure a crucial factor: damage. Any damage taken will be subtracted from the animat's fitness (decreasing the estimate of the capability to dodge fire).
In the unlikely event that these two fitness components fail to guide the evolution to desirable avoidance behaviors, a more continuous reward feedback can be used (instead of the fitness computed at the end of the sequence). Any movement away from the closest point of contact is rewarded, measured with time as the fourth dimension. So we're finding the closest 3D point given 4D trajectories in time and 3D space. Any movement away from that point results in positive fitness.