Python – How to obtain information gain from a scikit-learn DecisionTreeClassifier

classificationmachine learningpythonscikit-learn

I see that DecisionTreeClassifier accepts criterion='entropy', which means that it must be using information gain as a criterion for splitting the decision tree.
What I need is the information gain for each feature at the root level, when it is about to split the root node.

Best Answer

You can only access the information gain (or gini impurity) for a feature that has been used as a split node. The attribute DecisionTreeClassifier.tree_.best_error[i] holds the entropy of the i-th node splitting on feature DecisionTreeClassifier.tree_.feature[i]. If you want the entropy of all examples that reach the i-th node look at DecisionTreeClassifier.tree_.init_error[i].

For more information see the documentation here: https://github.com/scikit-learn/scikit-learn/blob/dacfd8bd5d943cb899ed8cd423aaf11b4f27c186/sklearn/tree/_tree.pyx#L64

If you want to access the entropy for each feature (at a certain split node) - you need to modify the function find_best_split https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L713