Machine Learning (Chapter 33): Decision Trees for Classification

Machine Learning (Chapter 33): Decision Trees for Classification - Loss Functions

By Ritesh Sahu September 01, 2024

Chapter 33: Decision Trees for Classification - Loss Functions

Introduction

Decision Trees are a powerful and widely used model for classification tasks in Machine Learning. They work by recursively partitioning the input space into subsets, which correspond to different output classes. Each decision node in the tree represents a feature, and the branches represent possible values or ranges of that feature, leading to the leaf nodes, which provide the classification.

A critical component in the construction of decision trees is the selection of the optimal splits. This selection is guided by a loss function (also referred to as a cost function or impurity measure), which quantifies the quality of a split. The lower the loss, the better the split. This chapter focuses on the loss functions used in decision trees for classification tasks.

Common Loss Functions

Gini Impurity
Gini Impurity measures the likelihood of an incorrect classification of a randomly chosen element if it was randomly labeled according to the distribution of labels in the dataset. It is calculated using the following formula:
$Gini(p) = \sum_{i=1}^{k} p_i (1 - p_i) = 1 - \sum_{i=1}^{k} p_i^2$
where:
- $p_i$ is the probability of an element being classified to a particular class $i$ ,
- $k$ is the total number of classes.
Example: Suppose a node contains the following class distribution:
- Class 1: 0.3
- Class 2: 0.7
The Gini Impurity would be:
$Gini = 1 - (0.3^2 + 0.7^2) = 1 - (0.09 + 0.49) = 1 - 0.58 = 0.42$
Entropy (Information Gain)
Entropy measures the amount of uncertainty or disorder in a set. The goal is to minimize entropy to create more homogenous nodes. Entropy is defined as:
$Entropy(p) = -\sum_{i=1}^{k} p_i \log_2(p_i)$
where:
- $p_i$ is the probability of an element being classified to a particular class $i$ ,
- $k$ is the total number of classes.
Example: For a binary classification:
- Class 1: 0.4
- Class 2: 0.6
The entropy would be:
$Entropy = - (0.4 \log_2 0.4 + 0.6 \log_2 0.6) \approx 0.97095$
Misclassification Error
Misclassification error is the simplest form of a loss function, focusing only on the most probable class and ignoring the distribution of the other classes. It is defined as:
$Error(p) = 1 - \max(p_i)$
where:
- $p_i$ is the probability of an element being classified to a particular class $i$ .
Example: Suppose the class probabilities are:
- Class 1: 0.1
- Class 2: 0.9
The misclassification error would be:
$Error = 1 - 0.9 = 0.1$

Implementation in Python

Let's implement a decision tree classifier using the scikit-learn library in Python and explore how different loss functions affect the decision tree's behavior.

python:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Decision Tree using Gini Impurity
clf_gini = DecisionTreeClassifier(criterion='gini', random_state=42)
clf_gini.fit(X_train, y_train)
y_pred_gini = clf_gini.predict(X_test)

# Train Decision Tree using Entropy
clf_entropy = DecisionTreeClassifier(criterion='entropy', random_state=42)
clf_entropy.fit(X_train, y_train)
y_pred_entropy = clf_entropy.predict(X_test)

# Evaluate models
accuracy_gini = accuracy_score(y_test, y_pred_gini)
accuracy_entropy = accuracy_score(y_test, y_pred_entropy)

print(f'Accuracy with Gini Impurity: {accuracy_gini:.2f}')
print(f'Accuracy with Entropy: {accuracy_entropy:.2f}')

Implementation in Java

Here's how you can implement a decision tree classifier using the weka library in Java:

java:
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.classifiers.trees.J48;
import weka.classifiers.Evaluation;
import java.util.Random;

public class DecisionTreeClassifier {
    public static void main(String[] args) throws Exception {
        // Load dataset
        DataSource source = new DataSource("iris.arff");
        Instances data = source.getDataSet();

        // Set class index to the last attribute
        data.setClassIndex(data.numAttributes() - 1);

        // Create and train Decision Tree with Gini (default)
        J48 treeGini = new J48();
        treeGini.setOptions(new String[]{"-C", "0.25", "-M", "2"});
        treeGini.buildClassifier(data);

        // Create and train Decision Tree with Entropy
        J48 treeEntropy = new J48();
        treeEntropy.setOptions(new String[]{"-C", "0.25", "-M", "2", "-B"});
        treeEntropy.buildClassifier(data);

        // Evaluate the models
        Evaluation evalGini = new Evaluation(data);
        evalGini.crossValidateModel(treeGini, data, 10, new Random(1));

        Evaluation evalEntropy = new Evaluation(data);
        evalEntropy.crossValidateModel(treeEntropy, data, 10, new Random(1));

        System.out.println("Gini Impurity Decision Tree Accuracy: " + evalGini.pctCorrect());
        System.out.println("Entropy Decision Tree Accuracy: " + evalEntropy.pctCorrect());
    }
}

Conclusion

Choosing the right loss function is crucial in the performance of decision trees for classification tasks. Gini Impurity and Entropy are the most commonly used, with Misclassification Error being less frequently applied. The choice between Gini and Entropy often depends on the specific dataset and the desired characteristics of the decision tree.

In this chapter, we explored these loss functions, their mathematical formulations, and practical implementations in Python and Java. Understanding these fundamentals is essential for effectively using decision trees in various machine-learning applications.

Search This Blog

Machine learning and artificial intelligence