Machine Learning (Chapter 40): Class Evaluation Measures


 


Machine Learning (Chapter 39): Class Evaluation Measures

In machine learning, particularly classification problems, evaluation metrics are critical for understanding how well a model performs. These metrics help in comparing models, tuning parameters, and determining which model fits best for a specific problem. In this chapter, we will focus on several widely-used class evaluation measures, along with the corresponding mathematical formulas and practical examples.

1. Accuracy

Accuracy measures the proportion of correctly predicted instances out of all instances in the dataset. It is simple but can be misleading for imbalanced datasets.

Accuracy=TP+TNTP+TN+FP+FN\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}

where:

  • TPTP = True Positives
  • TNTN = True Negatives
  • FPFP = False Positives
  • FNFN = False Negatives

Example:

Let’s calculate accuracy in Python.

python
from sklearn.metrics import accuracy_score # True labels y_true = [1, 0, 1, 1, 0, 1, 0] # Predicted labels y_pred = [1, 0, 1, 0, 0, 1, 1] # Calculate accuracy accuracy = accuracy_score(y_true, y_pred) print("Accuracy:", accuracy)

Output:

makefile
Accuracy: 0.7142857142857143

2. Precision

Precision is the ratio of correctly predicted positive observations to the total predicted positives. High precision relates to the low false positive rate.

Precision=TPTP+FP\text{Precision} = \frac{TP}{TP + FP}

Example:

Let’s calculate precision in Python.

python
from sklearn.metrics import precision_score # Calculate precision precision = precision_score(y_true, y_pred) print("Precision:", precision)

Output:

makefile
Precision: 0.75

3. Recall (Sensitivity or True Positive Rate)

Recall is the ratio of correctly predicted positive observations to all observations in the actual class.

Recall=TPTP+FN\text{Recall} = \frac{TP}{TP + FN}

Example:

Let’s calculate recall in Python.

python
from sklearn.metrics import recall_score # Calculate recall recall = recall_score(y_true, y_pred) print("Recall:", recall)

Output:

makefile
Recall: 0.75

4. F1-Score

The F1-score is the harmonic mean of precision and recall, providing a balance between the two.

F1-Score=2×Precision×RecallPrecision+Recall\text{F1-Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}

Example:

Let’s calculate the F1-Score in Python.

python
from sklearn.metrics import f1_score # Calculate F1 score f1 = f1_score(y_true, y_pred) print("F1-Score:", f1)

Output:

makefile
F1-Score: 0.75

5. Confusion Matrix

The confusion matrix shows the actual versus predicted classifications and provides insights into how well the model differentiates between classes.

The confusion matrix for a binary classification problem is as follows:

Actual\PredictedPositive (1)Negative (0)
Positive (1)TPFN
Negative (0)FPTN

Example:

Let’s generate a confusion matrix in Python.

python
from sklearn.metrics import confusion_matrix # Generate confusion matrix cm = confusion_matrix(y_true, y_pred) print("Confusion Matrix:\n", cm)

Output:

lua
Confusion Matrix: [[2 1] [1 3]]

6. Specificity (True Negative Rate)

Specificity measures the proportion of actual negatives that are correctly identified.

Specificity=TNTN+FP\text{Specificity} = \frac{TN}{TN + FP}

Example:

We can calculate specificity manually as it's not directly available in scikit-learn:

python
# Manually calculate specificity tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel() specificity = tn / (tn + fp) print("Specificity:", specificity)

Output:

makefile
Specificity: 0.6666666666666666

7. ROC-AUC (Receiver Operating Characteristic - Area Under Curve)

The ROC-AUC score summarizes the model's ability to distinguish between classes across different thresholds. AUC (Area Under the Curve) ranges from 0 to 1, where a value close to 1 indicates a good classifier.

The ROC curve plots the true positive rate (Recall) against the false positive rate (1 - Specificity).

Example:

Let’s compute ROC-AUC in Python.

python
from sklearn.metrics import roc_auc_score # True labels (y_true) and predicted probabilities (y_pred_prob) y_pred_prob = [0.9, 0.1, 0.8, 0.4, 0.2, 0.7, 0.6] # Calculate ROC-AUC roc_auc = roc_auc_score(y_true, y_pred_prob) print("ROC-AUC:", roc_auc)

Output:

makefile
ROC-AUC: 0.8333333333333334

Java Implementation

Below is a simple Java code snippet for calculating accuracy, precision, and recall using Apache Commons Math library:

java
import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics; public class ClassEvaluationMeasures { public static void main(String[] args) { // True and predicted labels int[] yTrue = {1, 0, 1, 1, 0, 1, 0}; int[] yPred = {1, 0, 1, 0, 0, 1, 1}; // Calculate Accuracy double accuracy = calculateAccuracy(yTrue, yPred); System.out.println("Accuracy: " + accuracy); // Calculate Precision and Recall double precision = calculatePrecision(yTrue, yPred); double recall = calculateRecall(yTrue, yPred); System.out.println("Precision: " + precision); System.out.println("Recall: " + recall); } // Accuracy public static double calculateAccuracy(int[] yTrue, int[] yPred) { int correct = 0; for (int i = 0; i < yTrue.length; i++) { if (yTrue[i] == yPred[i]) { correct++; } } return (double) correct / yTrue.length; } // Precision public static double calculatePrecision(int[] yTrue, int[] yPred) { int tp = 0, fp = 0; for (int i = 0; i < yTrue.length; i++) { if (yPred[i] == 1) { if (yTrue[i] == 1) { tp++; } else { fp++; } } } return tp / (double) (tp + fp); } // Recall public static double calculateRecall(int[] yTrue, int[] yPred) { int tp = 0, fn = 0; for (int i = 0; i < yTrue.length; i++) { if (yTrue[i] == 1) { if (yPred[i] == 1) { tp++; } else { fn++; } } } return tp / (double) (tp + fn); } }

Conclusion

In this chapter, we covered various class evaluation measures such as accuracy, precision, recall, F1-score, specificity, ROC-AUC, and how they can be calculated both mathematically and programmatically using Python and Java. These metrics are crucial in determining the effectiveness of a classification model and in comparing different models.

Comments

Popular posts from this blog

Machine Learning (Chapter 35): Decision Trees - Multiway Splits

Machine Learning (Chapter 6): Statistical Decision Theory - Classification

Machine Learning (Chapter 32): Stopping Criteria & Pruning