Machine Learning (Chapter 40): Class Evaluation Measures

Machine Learning (Chapter 39): Class Evaluation Measures

In machine learning, particularly classification problems, evaluation metrics are critical for understanding how well a model performs. These metrics help in comparing models, tuning parameters, and determining which model fits best for a specific problem. In this chapter, we will focus on several widely-used class evaluation measures, along with the corresponding mathematical formulas and practical examples.

1. Accuracy

Accuracy measures the proportion of correctly predicted instances out of all instances in the dataset. It is simple but can be misleading for imbalanced datasets.

$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$

where:

$TP$ = True Positives
$TN$ = True Negatives
$FP$ = False Positives
$FN$ = False Negatives

Example:

Let’s calculate accuracy in Python.

python
from sklearn.metrics import accuracy_score

# True labels
y_true = [1, 0, 1, 1, 0, 1, 0]

# Predicted labels
y_pred = [1, 0, 1, 0, 0, 1, 1]

# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print("Accuracy:", accuracy)

Output:

makefile
Accuracy: 0.7142857142857143

2. Precision

Precision is the ratio of correctly predicted positive observations to the total predicted positives. High precision relates to the low false positive rate.

$\text{Precision} = \frac{TP}{TP + FP}$

Example:

Let’s calculate precision in Python.

python
from sklearn.metrics import precision_score

# Calculate precision
precision = precision_score(y_true, y_pred)
print("Precision:", precision)

Output:

makefile
Precision: 0.75

3. Recall (Sensitivity or True Positive Rate)

Recall is the ratio of correctly predicted positive observations to all observations in the actual class.

$\text{Recall} = \frac{TP}{TP + FN}$

Example:

Let’s calculate recall in Python.

python
from sklearn.metrics import recall_score

# Calculate recall
recall = recall_score(y_true, y_pred)
print("Recall:", recall)

Output:

makefile
Recall: 0.75

4. F1-Score

The F1-score is the harmonic mean of precision and recall, providing a balance between the two.

$\text{F1-Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$

Example:

Let’s calculate the F1-Score in Python.

python
from sklearn.metrics import f1_score

# Calculate F1 score
f1 = f1_score(y_true, y_pred)
print("F1-Score:", f1)

Output:

makefile
F1-Score: 0.75

5. Confusion Matrix

The confusion matrix shows the actual versus predicted classifications and provides insights into how well the model differentiates between classes.

The confusion matrix for a binary classification problem is as follows:

Actual\Predicted	Positive (1)	Negative (0)
Positive (1)	TP	FN
Negative (0)	FP	TN

Example:

Let’s generate a confusion matrix in Python.

python
from sklearn.metrics import confusion_matrix

# Generate confusion matrix
cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:\n", cm)

Output:

lua
Confusion Matrix:
[[2 1]
 [1 3]]

6. Specificity (True Negative Rate)

Specificity measures the proportion of actual negatives that are correctly identified.

$\text{Specificity} = \frac{TN}{TN + FP}$

Example:

We can calculate specificity manually as it's not directly available in scikit-learn:

python
# Manually calculate specificity
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
specificity = tn / (tn + fp)
print("Specificity:", specificity)

Output:

makefile
Specificity: 0.6666666666666666

7. ROC-AUC (Receiver Operating Characteristic - Area Under Curve)

The ROC-AUC score summarizes the model's ability to distinguish between classes across different thresholds. AUC (Area Under the Curve) ranges from 0 to 1, where a value close to 1 indicates a good classifier.

The ROC curve plots the true positive rate (Recall) against the false positive rate (1 - Specificity).

Example:

Let’s compute ROC-AUC in Python.

python
from sklearn.metrics import roc_auc_score

# True labels (y_true) and predicted probabilities (y_pred_prob)
y_pred_prob = [0.9, 0.1, 0.8, 0.4, 0.2, 0.7, 0.6]

# Calculate ROC-AUC
roc_auc = roc_auc_score(y_true, y_pred_prob)
print("ROC-AUC:", roc_auc)

Output:

makefile
ROC-AUC: 0.8333333333333334

Java Implementation

Below is a simple Java code snippet for calculating accuracy, precision, and recall using Apache Commons Math library:

java
import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;

public class ClassEvaluationMeasures {

    public static void main(String[] args) {
        // True and predicted labels
        int[] yTrue = {1, 0, 1, 1, 0, 1, 0};
        int[] yPred = {1, 0, 1, 0, 0, 1, 1};

        // Calculate Accuracy
        double accuracy = calculateAccuracy(yTrue, yPred);
        System.out.println("Accuracy: " + accuracy);

        // Calculate Precision and Recall
        double precision = calculatePrecision(yTrue, yPred);
        double recall = calculateRecall(yTrue, yPred);
        System.out.println("Precision: " + precision);
        System.out.println("Recall: " + recall);
    }

    // Accuracy
    public static double calculateAccuracy(int[] yTrue, int[] yPred) {
        int correct = 0;
        for (int i = 0; i < yTrue.length; i++) {
            if (yTrue[i] == yPred[i]) {
                correct++;
            }
        }
        return (double) correct / yTrue.length;
    }

    // Precision
    public static double calculatePrecision(int[] yTrue, int[] yPred) {
        int tp = 0, fp = 0;
        for (int i = 0; i < yTrue.length; i++) {
            if (yPred[i] == 1) {
                if (yTrue[i] == 1) {
                    tp++;
                } else {
                    fp++;
                }
            }
        }
        return tp / (double) (tp + fp);
    }

    // Recall
    public static double calculateRecall(int[] yTrue, int[] yPred) {
        int tp = 0, fn = 0;
        for (int i = 0; i < yTrue.length; i++) {
            if (yTrue[i] == 1) {
                if (yPred[i] == 1) {
                    tp++;
                } else {
                    fn++;
                }
            }
        }
        return tp / (double) (tp + fn);
    }
}

Conclusion

In this chapter, we covered various class evaluation measures such as accuracy, precision, recall, F1-score, specificity, ROC-AUC, and how they can be calculated both mathematically and programmatically using Python and Java. These metrics are crucial in determining the effectiveness of a classification model and in comparing different models.

Search This Blog

Machine learning and artificial intelligence