Machine learning(2): Decision tree
What is the decision tree?
A tree in which:
1. Each terminal node (leaf) is associated with a class.
2. Each non-terminal node is associated with one of the attributes that examples possess.
3. Each branch is associated with a particular value that the attribute of its parent node can take
What is the purpose of the decision tree?
A procedure that, given a training set, attempts to build a decision tree that will correctly predict the class of any unclassified example.
Why choose the best attribute to create a decision tree?
Because, we could use the best attribute and other attributes to construct a tree model, such as leaf nodes and subnodes.
What is the best attribute?
It is “many possible definitions”.
A reasonable answer would be the attribute that best discriminates the examples concerning their classes.
What is the meaning of the best attribute?
It is still “many possible answers”.
Many different criteria have many users.
Shannon’s Function
Example:
We have a set of 100 examples. In these examples, we have two classes that are c1 and c2.
70 are in c1
30 are in c2
we could calculate the information of these examples:
Now, we could give these examples an attribute v1 and v2.
There are 69 of them; 63 in c1 and 6 in c2.
So for the subset, p(c1)=p(63/69)=0.913
p(c2)=p(6/69)= 0.087
SO, the information = -0.913In2(0,913)-0.087In2(0.087)= 0.43
There are 31 examples: 7 in c1 and 24 in c2.
So for the subset, p(c1)=p(7/31) = 0.226
p(c2)=p(24/31) = 0.774
So, the information = -0.226In2(0.226)-0.774In2(0.774) = 0.77
0.43 is v1; 0.77 is v2.
But 69% have the value of v1 and 31% have the value of v2.
So, 0.690.43+0.310.77=0.88
The information gain= 0.88-0.54=0.34