JavaScript EditorFree JavaScript Editor     Ajax Editor 

Main Page
  Previous Section Next Section


Implementing the interfaces in a generic way is actually slightly more challenging than one would expect, particularly dealing with the different possible types of the any values.

Data Structures

Three essential data structures are used in the implementation. The first is obviously a node in the tree, the other is a decision, and the third represents an attribute.

Tree Nodes

The nodes in the tree contain three different elements: children, a decision, and the response. The children are basically stored as an array of nodes. The response is an any variable that's most representative of the subset of data corresponding to this node, and the decision is a criterion used to evaluate each sample.


A decision is a base virtual class that needs to be inherited from. This allows any kind of decision to be implemented in the system, regardless of the number of attributes or their type.

Boolean decisions, for example, check the value of a particular predictor variable, and suggest one of the two branches. Numeric decisions split the attribute up into ranges, one for each branch. The most common decision is based on one single attribute, which is directly referenced.


Attributes represent both the predictor variables and the response variable. These store important information, such as the name, range, or type—allowing the learning algorithms to find out more about the attributes relatively easily.

Tree Induction Algorithm

The algorithm used for both incremental and batch learning is based on recursive partitioning, explained in Chapter 26, "Classification and Regression Trees." The algorithm is the same, but the online learning approach tries to minimize changes to the tree by only relearning it when necessary. We could implement other incremental learning algorithms interchangeably if this approach proved unsatisfactory.

The process is quite simple and follows the pseudo-code from Listing 26.2 closely. First, we compute the closest response value for the node and data set. If this is perfect, we return. Then, the attribute with the most information gain is identified, and a decision is created. This is used to split the data set into subsets. The same Build() procedure is applied recursively to the children, as discussed in Chapter 26.


The algorithm to simulate the tree is very efficient and extremely compact. Essentially, the current node is set to the root. The node's decision is called and returns the index of the child. The recursion continues with the child chosen. Otherwise, if there is no child, the response value of the current node is returned.

      Previous Section Next Section

    JavaScript EditorAjax Editor     JavaScript Editor