Note: You may find it helpful to get the tree structure completely working
before coding up the entropy calculations

 

Calculating entropy for data set: tennis.arff

 

 

Start at the root node with all the instances:

 

outlook

temperature

humidity

wind

playTennis

sunny

hot

�high

�weak

�no

sunny

hot

�high

�strong

�no

overcast

hot

�high

�weak

�yes

rain

mild

�high

�weak

�yes

rain

cool

�normal

�weak

�yes

rain

cool

�normal

�strong

�no

overcast

cool

�normal

�strong

�yes

sunny

mild

�high

�weak

�no

sunny

cool

�normal

�weak

�yes

rain

mild

�normal

�weak

�yes

sunny

mild

�normal

�strong

�yes

overcast

mild

�high

�strong

�yes

overcast

hot

�normal

�weak

�yes

rain

mild

�high

�strong

�no

 

 

(Note that this example uses log2. The decisions will be identical if you use loge,
or any other base, but the entropy and gain values will be scaled differently.)

Node: ( 5/14 9/14 ) Entropy=0.9402859586706309

����� Attribute 0-Outlook:

����������� Value 0-Sunny: ( 3/5 2/5 ) Entropy=0.9709505944546686

����������� Value 1-Overcast: ( 0/4 4/4 ) Entropy=0.0

����������� Value 2-Rain: ( 2/5 3/5 ) Entropy=0.9709505944546686

����������� InfoGain=0.2467498197744391

����� Attribute 1-Temperature:

����������� Value 0-Hot: ( 2/4 2/4 ) Entropy=1.0

����������� Value 1-Mild: ( 2/6 4/6 ) Entropy=0.9182958340544896

����������� Value 2-Cool: ( 1/4 3/4 ) Entropy=0.8112781244591328

����������� InfoGain=0.029222565658954647

����� Attribute 2-Humidity:

����������� Value 0-High: ( 4/7 3/7 ) Entropy=0.9852281360342516

����������� Value 1-Normal: ( 1/7 6/7 ) Entropy=0.5916727785823275

����������� InfoGain=0.15183550136234136

����� Attribute 3-Wind:

����������� Value 0-Weak: ( 2/8 6/8 ) Entropy=0.8112781244591328

����������� Value 1-Strong: ( 3/6 3/6 ) Entropy=1.0

����������� InfoGain=0.04812703040826932

 

Maximum InfoGain=0.247

Split on Attribute 0-Outlook

 

Move on to the next node in the tree that can be expanded:

 

outlook

temperature

humidity

wind

playTennis

sunny

�hot

�high

�weak

�no

sunny

�hot

�high

�strong

�no

sunny

�mild

�high

�weak

�no

sunny

�cool

�normal

�weak

�yes

sunny

�mild

�normal

�strong

�yes

 

Node: ( 3/5 2/5 ) Entropy=0.9709505944546686

����� Attribute 1-Temperature:

����������� Value 0-Hot: ( 2/2 0/2 ) Entropy=0.0

����������� Value 1-Mild: ( 1/2 1/2 ) Entropy=1.0

����������� Value 2-Cool: ( 0/1 1/1 ) Entropy=0.0

����������� InfoGain=0.5709505944546686

����� Attribute 2-Humidity

����������� Value 0-High: ( 3/3 0/3 ) Entropy=0.0

����������� Value 1-Normal: ( 0/2 2/2 ) Entropy=0.0

����������� InfoGain=0.9709505944546686

����� Attribute 3-Wind:

����������� Value 0-Weak: ( 2/3 1/3 ) Entropy=0.9182958340544896

����������� Value 1-Strong: ( 1/2 1/2 ) Entropy=1.0

����������� InfoGain=0.01997309402197489

 

Maximum InfoGain=0.971

Split on Attribute 2-Humidity

 

YES

 

YES

 

NO

 

 

 

Move on to the next node in the tree that can be expanded:

 

outlook

temperature

humidity

wind

playTennis

rain

�mild

�high

�weak

�yes

rain

cool

�normal

�weak

�yes

rain

�cool

�normal

�strong

�no

rain

�mild

�normal

�weak

�yes

rain

�mild

�high

�strong

�no

 

Node: ( 2/5 3/5 ) Entropy=0.9709505944546686

����� Attribute 1-Temperature:

����������� Value 0-Hot: ( 0/0 0/0 ) Entropy=0.0

����������� Value 1-Mild: ( 1/3 2/3 ) Entropy=0.9182958340544896

����������� Value 2-Cool: ( 1/2 1/2 ) Entropy=1.0

����������� InfoGain=0.01997309402197489

����� Attribute 2-Humidity:

����������� Value 0-High: ( 1/2 1/2 ) Entropy=1.0

����������� Value 1-Normal: ( 1/3 2/3 ) Entropy=0.9182958340544896

����������� InfoGain=0.01997309402197489

����� Attribute 3-Wind:

����������� Value 0-Weak ( 0/3 3/3 ) Entropy=0.0

����������� Value 1-Strong: ( 2/2 0/2 ) Entropy=0.0

����������� InfoGain=0.9709505944546686

 

Maximum InfoGain=0.971

Split on Attribute 3-Wind

NO

 

YES

 

YES

 

YES

 

NO