Free JavaScript Editor Ajax Editor

↑

Main Page

## Optimization of Perceptron WeightsHow it is possible to apply these techniques to teach a perceptron, and specifically train the weights to produce the right results? This is done on a case-per-case basis. Given specific inputs In computer games, the desired output can be obtained in many ways. It can be designed by an expert (training), copied from human behavior (imitation), computed by another AI component (automated learning), or based on a previous result (bootstrapping). This process is discussed further in Chapter 35, "Designing Learning AI." Optimization of perceptrons is also a best-guess process. In this case, the estimate is a set of weights. The output of the perceptron can be evaluated with these weights. Comparing the actual output to the desired output gives us an error. Then, given the current network, we compute a new estimate of the weights that generate fewer errors. Correcting the weights is done by the delta rule. ## The Delta Rule ExplainedThe delta rule is used the most often to adjust the weights. The key observations are as follows: Each weight contributes a variable amount to the output. The scale of the contribution depends on the input. The error in the output can be "blamed" on the weights.
Reducing the error requires adjusting the output toward its ideal value. To do this, we use multiple small adjustments in the weights instead (because only they can change). The adjustments can be distributed proportionally to the contribution of the weights; the bigger the contribution to an error, the bigger the adjustment! With the intuitive explanation covered, let's move into a more practical mode. First, relative error E is computed using the difference between the actual output y and the desired output t. This is known as an error function, which usually takes the form of E = ½(t–y)
It's now possible to compute the gradient of the error in each of the weights. This is denoted E/w ## Figure 17.11. Correcting the weights of the perceptron based on the output error, proportionally to the input values.
Then, to adjust the weights, we can use any gradient method: steepest descent, for example. The step Dw
Again, eta h is a small constant known as the learning rate. ## Formal ProofThis part of the chapter provides proof of how to compute the gradient E/w Noting that the error only depends on the net sum z, the chain rule can be applied:
This expresses the gradient of the error in each weight as the gradient of the error relative to the net sum E/z, multiplied by the gradient of the net sum relative to the weight z/w
Now for the first right-side term. The net sum is the output y before the activation function s, so the gradient can be re-expressed (using the derivative of the error function):
Because most perceptrons since the Adaline use an identity activation function as s(z) = z, s'(z) = 1 holds. This implies the following:
There we have it. The gradient of the error for each weight is just the negated input multiplied by the relative error –x This result is surprisingly intuitive. When we want to adjust the output for a given input, all we have to play with is the weights. Each weight could be adjusted to contribute toward the target using (t–y). However, the contribution of the weight depends on the value of the input, so this step (t–y) can be scaled by x |

↓

Ajax Editor JavaScript Editor