Machine Learning (Chapter 38): Evaluation Measures 1

By Ritesh Sahu September 13, 2024

Machine Learning (Chapter 38): Evaluation Measures 1

In machine learning, evaluation measures are crucial to assess the performance of a model. This chapter will focus on key metrics for evaluating classification models, including accuracy, precision, recall, F1 score, and the confusion matrix. We will also look at mathematical formulas and provide Python/Java examples for better understanding.

1. Confusion Matrix

The confusion matrix is a tabular representation of actual vs. predicted classes. It helps to visualize the performance of a classification model by categorizing predictions into:

True Positive (TP): Correctly predicted positive classes.
True Negative (TN): Correctly predicted negative classes.
False Positive (FP): Incorrectly predicted as positive (Type I error).
False Negative (FN): Incorrectly predicted as negative (Type II error).

	Predicted Positive	Predicted Negative
Actual Positive	True Positive (TP)	False Negative (FN)
Actual Negative	False Positive (FP)	True Negative (TN)

2. Accuracy

Accuracy is the ratio of correctly predicted observations to the total observations.

$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$

Example in Python:

python
from sklearn.metrics import confusion_matrix, accuracy_score

# Example data
actual = [1, 0, 1, 1, 0, 1, 0, 0]
predicted = [1, 0, 1, 0, 0, 1, 1, 0]

# Confusion Matrix
conf_matrix = confusion_matrix(actual, predicted)
print(f"Confusion Matrix:\n{conf_matrix}")

# Accuracy
accuracy = accuracy_score(actual, predicted)
print(f"Accuracy: {accuracy}")

Example in Java:

java
import java.util.Arrays;

public class EvaluationMetrics {
    public static void main(String[] args) {
        int[] actual = {1, 0, 1, 1, 0, 1, 0, 0};
        int[] predicted = {1, 0, 1, 0, 0, 1, 1, 0};
        
        int TP = 0, TN = 0, FP = 0, FN = 0;
        
        for (int i = 0; i < actual.length; i++) {
            if (actual[i] == 1 && predicted[i] == 1) TP++;
            if (actual[i] == 0 && predicted[i] == 0) TN++;
            if (actual[i] == 0 && predicted[i] == 1) FP++;
            if (actual[i] == 1 && predicted[i] == 0) FN++;
        }

        double accuracy = (double)(TP + TN) / (TP + TN + FP + FN);
        System.out.println("Confusion Matrix: ");
        System.out.println("TP: " + TP + " FP: " + FP);
        System.out.println("FN: " + FN + " TN: " + TN);
        System.out.println("Accuracy: " + accuracy);
    }
}

3. Precision

Precision measures the accuracy of the positive predictions. It is the ratio of true positives to all positive predictions.

$\text{Precision} = \frac{TP}{TP + FP}$

Example in Python:

python
from sklearn.metrics import precision_score

# Precision
precision = precision_score(actual, predicted)
print(f"Precision: {precision}")

Example in Java:

java
double precision = (double) TP / (TP + FP);
System.out.println("Precision: " + precision);

4. Recall (Sensitivity)

Recall (or Sensitivity) measures how well the model identifies positive cases. It is the ratio of true positives to all actual positive cases.

$\text{Recall} = \frac{TP}{TP + FN}$

Example in Python:

python
from sklearn.metrics import recall_score

# Recall
recall = recall_score(actual, predicted)
print(f"Recall: {recall}")

Example in Java:

java
double recall = (double) TP / (TP + FN);
System.out.println("Recall: " + recall);

5. F1 Score

The F1 Score is the harmonic mean of precision and recall. It balances both metrics, especially when you want to give equal importance to precision and recall.

$\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$

Example in Python:

python
from sklearn.metrics import f1_score

# F1 Score
f1 = f1_score(actual, predicted)
print(f"F1 Score: {f1}")

Example in Java:

java
double f1 = 2 * (precision * recall) / (precision + recall);
System.out.println("F1 Score: " + f1);

6. Conclusion

These evaluation measures provide essential insights into how well your model is performing. While accuracy is the most basic metric, it may not always be the best measure, especially when the classes are imbalanced. Precision, recall, and the F1 score offer deeper insights into the model's behavior, particularly in critical applications like medical diagnoses or fraud detection.

By calculating these metrics and analyzing the confusion matrix, we can better understand our model's strengths and areas of improvement.

Search This Blog

Machine learning and artificial intelligence