Note: You may find it helpful to get the tree structure completely working
before coding up the entropy calculations

 

Calculating entropy for data set: tennis.arff

 

 

Start at the root node with all the instances:

 

outlook

temperature

humidity

wind

playTennis

sunny

hot

high

weak

no

sunny

hot

high

strong

no

overcast

hot

high

weak

yes

rain

mild

high

weak

yes

rain

cool

normal

weak

yes

rain

cool

normal

strong

no

overcast

cool

normal

strong

yes

sunny

mild

high

weak

no

sunny

cool

normal

weak

yes

rain

mild

normal

weak

yes

sunny

mild

normal

strong

yes

overcast

mild

high

strong

yes

overcast

hot

normal

weak

yes

rain

mild

high

strong

no

 

 

(Note that this example uses log2. The decisions will be identical if you use loge,
or any other base, but the entropy and gain values will be scaled differently.)

Node: ( 5/14 9/14 ) Entropy=0.9402859586706309

����� Attribute 0-Outlook:

����������� Value 0-Sunny: ( 3/5 2/5 ) Entropy=0.9709505944546686

����������� Value 1-Overcast: ( 0/4 4/4 ) Entropy=0.0

����������� Value 2-Rain: ( 2/5 3/5 ) Entropy=0.9709505944546686

����������� InfoGain=0.2467498197744391

����� Attribute 1-Temperature:

����������� Value 0-Hot: ( 2/4 2/4 ) Entropy=1.0

����������� Value 1-Mild: ( 2/6 4/6 ) Entropy=0.9182958340544896

����������� Value 2-Cool: ( 1/4 3/4 ) Entropy=0.8112781244591328

����������� InfoGain=0.029222565658954647

����� Attribute 2-Humidity:

����������� Value 0-High: ( 4/7 3/7 ) Entropy=0.9852281360342516

����������� Value 1-Normal: ( 1/7 6/7 ) Entropy=0.5916727785823275

����������� InfoGain=0.15183550136234136

����� Attribute 3-Wind:

����������� Value 0-Weak: ( 2/8 6/8 ) Entropy=0.8112781244591328

����������� Value 1-Strong: ( 3/6 3/6 ) Entropy=1.0

����������� InfoGain=0.04812703040826932

 

Maximum InfoGain=0.247

Split on Attribute 0-Outlook

 

Move on to the next node in the tree that can be expanded:

 

outlook

temperature

humidity

wind

playTennis

sunny

hot

high

weak

no

sunny

hot

high

strong

no

sunny

mild

high

weak

no

sunny

cool

normal

weak

yes

sunny

mild

normal

strong

yes

 

Node: ( 3/5 2/5 ) Entropy=0.9709505944546686

����� Attribute 1-Temperature:

����������� Value 0-Hot: ( 2/2 0/2 ) Entropy=0.0

����������� Value 1-Mild: ( 1/2 1/2 ) Entropy=1.0

����������� Value 2-Cool: ( 0/1 1/1 ) Entropy=0.0

����������� InfoGain=0.5709505944546686

����� Attribute 2-Humidity

����������� Value 0-High: ( 3/3 0/3 ) Entropy=0.0

����������� Value 1-Normal: ( 0/2 2/2 ) Entropy=0.0

����������� InfoGain=0.9709505944546686

����� Attribute 3-Wind:

����������� Value 0-Weak: ( 2/3 1/3 ) Entropy=0.9182958340544896

����������� Value 1-Strong: ( 1/2 1/2 ) Entropy=1.0

����������� InfoGain=0.01997309402197489

 

Maximum InfoGain=0.971

Split on Attribute 2-Humidity

 

YES

 

YES

 

NO

 

 

 

Move on to the next node in the tree that can be expanded:

 

outlook

temperature

humidity

wind

playTennis

rain

mild

high

weak

yes

rain

cool

normal

weak

yes

rain

cool

normal

strong

no

rain

mild

normal

weak

yes

rain

mild

high

strong

no

 

Node: ( 2/5 3/5 ) Entropy=0.9709505944546686

����� Attribute 1-Temperature:

����������� Value 0-Hot: ( 0/0 0/0 ) Entropy=0.0

����������� Value 1-Mild: ( 1/3 2/3 ) Entropy=0.9182958340544896

����������� Value 2-Cool: ( 1/2 1/2 ) Entropy=1.0

����������� InfoGain=0.01997309402197489

����� Attribute 2-Humidity:

����������� Value 0-High: ( 1/2 1/2 ) Entropy=1.0

����������� Value 1-Normal: ( 1/3 2/3 ) Entropy=0.9182958340544896

����������� InfoGain=0.01997309402197489

����� Attribute 3-Wind:

����������� Value 0-Weak ( 0/3 3/3 ) Entropy=0.0

����������� Value 1-Strong: ( 2/2 0/2 ) Entropy=0.0

����������� InfoGain=0.9709505944546686

 

Maximum InfoGain=0.971

Split on Attribute 3-Wind

NO

 

YES

 

YES

 

YES

 

NO