Decision Trees from Scratch Using ID3 Python: Coding It Up !!

update : We have introduced an interactive learning platform to learn machine learning / AI , check out this blog in interactive mode.

Please visit the previous article to get comfortable with the math behind decision tree ID3 algo

Import the required libraries

import numpy as npimport pandas as pdeps = np.finfo(float).epsfrom numpy import log2 as log

?eps? here is the smallest representable number. At times we get log(0) or 0 in the denominator, to avoid that we are going to use this.

Define the dataset:

Create pandas dataframe :

Image for post

Now lets try to remember the steps to create a decision tree?.

1.compute the entropy for data-set2.for every attribute/feature: 1.calculate entropy for all categorical values 2.take average information entropy for the current attribute 3.calculate gain for the current attribute3. pick the highest gain attribute.4. Repeat until we get the tree we desired

find the Entropy and then Information Gain for splitting the data set.

Image for post

We?ll define a function that takes in class (target variable vector) and finds the entropy of that class.

Here the fraction is ?pi?, it is the proportion of a number of elements in that split group to the number of elements in the group before splitting(parent group).

Image for post answer is same as we got in our previous article