I see that DecisionTreeClassifier accepts criterion='entropy', which means that it must be using information gain as a criterion for splitting the decision tree.
What I need is the information gain for each feature at the root level, when it is about to split the root node.
Python – How to obtain information gain from a scikit-learn DecisionTreeClassifier
classificationmachine learningpythonscikit-learn
Best Answer
You can only access the information gain (or gini impurity) for a feature that has been used as a split node. The attribute
DecisionTreeClassifier.tree_.best_error[i]
holds the entropy of the i-th node splitting on featureDecisionTreeClassifier.tree_.feature[i]
. If you want the entropy of all examples that reach the i-th node look atDecisionTreeClassifier.tree_.init_error[i]
.For more information see the documentation here: https://github.com/scikit-learn/scikit-learn/blob/dacfd8bd5d943cb899ed8cd423aaf11b4f27c186/sklearn/tree/_tree.pyx#L64
If you want to access the entropy for each feature (at a certain split node) - you need to modify the function
find_best_split
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L713