Machine Learning (Chapter 40): Class Evaluation Measures
Machine Learning (Chapter 39): Class Evaluation Measures
In machine learning, particularly classification problems, evaluation metrics are critical for understanding how well a model performs. These metrics help in comparing models, tuning parameters, and determining which model fits best for a specific problem. In this chapter, we will focus on several widely-used class evaluation measures, along with the corresponding mathematical formulas and practical examples.
1. Accuracy
Accuracy measures the proportion of correctly predicted instances out of all instances in the dataset. It is simple but can be misleading for imbalanced datasets.
where:
- = True Positives
- = True Negatives
- = False Positives
- = False Negatives
Example:
Let’s calculate accuracy in Python.
pythonfrom sklearn.metrics import accuracy_score
# True labels
y_true = [1, 0, 1, 1, 0, 1, 0]
# Predicted labels
y_pred = [1, 0, 1, 0, 0, 1, 1]
# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print("Accuracy:", accuracy)
Output:
makefileAccuracy: 0.7142857142857143
2. Precision
Precision is the ratio of correctly predicted positive observations to the total predicted positives. High precision relates to the low false positive rate.
Example:
Let’s calculate precision in Python.
pythonfrom sklearn.metrics import precision_score
# Calculate precision
precision = precision_score(y_true, y_pred)
print("Precision:", precision)
Output:
makefilePrecision: 0.75
3. Recall (Sensitivity or True Positive Rate)
Recall is the ratio of correctly predicted positive observations to all observations in the actual class.
Example:
Let’s calculate recall in Python.
pythonfrom sklearn.metrics import recall_score
# Calculate recall
recall = recall_score(y_true, y_pred)
print("Recall:", recall)
Output:
makefileRecall: 0.75
4. F1-Score
The F1-score is the harmonic mean of precision and recall, providing a balance between the two.
Example:
Let’s calculate the F1-Score in Python.
pythonfrom sklearn.metrics import f1_score
# Calculate F1 score
f1 = f1_score(y_true, y_pred)
print("F1-Score:", f1)
Output:
makefileF1-Score: 0.75
5. Confusion Matrix
The confusion matrix shows the actual versus predicted classifications and provides insights into how well the model differentiates between classes.
The confusion matrix for a binary classification problem is as follows:
Actual\Predicted | Positive (1) | Negative (0) |
---|---|---|
Positive (1) | TP | FN |
Negative (0) | FP | TN |
Example:
Let’s generate a confusion matrix in Python.
pythonfrom sklearn.metrics import confusion_matrix
# Generate confusion matrix
cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:\n", cm)
Output:
luaConfusion Matrix:
[[2 1]
[1 3]]
6. Specificity (True Negative Rate)
Specificity measures the proportion of actual negatives that are correctly identified.
Example:
We can calculate specificity manually as it's not directly available in scikit-learn:
python# Manually calculate specificity
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
specificity = tn / (tn + fp)
print("Specificity:", specificity)
Output:
makefileSpecificity: 0.6666666666666666
7. ROC-AUC (Receiver Operating Characteristic - Area Under Curve)
The ROC-AUC score summarizes the model's ability to distinguish between classes across different thresholds. AUC (Area Under the Curve) ranges from 0 to 1, where a value close to 1 indicates a good classifier.
The ROC curve plots the true positive rate (Recall) against the false positive rate (1 - Specificity).
Example:
Let’s compute ROC-AUC in Python.
pythonfrom sklearn.metrics import roc_auc_score
# True labels (y_true) and predicted probabilities (y_pred_prob)
y_pred_prob = [0.9, 0.1, 0.8, 0.4, 0.2, 0.7, 0.6]
# Calculate ROC-AUC
roc_auc = roc_auc_score(y_true, y_pred_prob)
print("ROC-AUC:", roc_auc)
Output:
makefileROC-AUC: 0.8333333333333334
Java Implementation
Below is a simple Java code snippet for calculating accuracy, precision, and recall using Apache Commons Math library:
javaimport org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;
public class ClassEvaluationMeasures {
public static void main(String[] args) {
// True and predicted labels
int[] yTrue = {1, 0, 1, 1, 0, 1, 0};
int[] yPred = {1, 0, 1, 0, 0, 1, 1};
// Calculate Accuracy
double accuracy = calculateAccuracy(yTrue, yPred);
System.out.println("Accuracy: " + accuracy);
// Calculate Precision and Recall
double precision = calculatePrecision(yTrue, yPred);
double recall = calculateRecall(yTrue, yPred);
System.out.println("Precision: " + precision);
System.out.println("Recall: " + recall);
}
// Accuracy
public static double calculateAccuracy(int[] yTrue, int[] yPred) {
int correct = 0;
for (int i = 0; i < yTrue.length; i++) {
if (yTrue[i] == yPred[i]) {
correct++;
}
}
return (double) correct / yTrue.length;
}
// Precision
public static double calculatePrecision(int[] yTrue, int[] yPred) {
int tp = 0, fp = 0;
for (int i = 0; i < yTrue.length; i++) {
if (yPred[i] == 1) {
if (yTrue[i] == 1) {
tp++;
} else {
fp++;
}
}
}
return tp / (double) (tp + fp);
}
// Recall
public static double calculateRecall(int[] yTrue, int[] yPred) {
int tp = 0, fn = 0;
for (int i = 0; i < yTrue.length; i++) {
if (yTrue[i] == 1) {
if (yPred[i] == 1) {
tp++;
} else {
fn++;
}
}
}
return tp / (double) (tp + fn);
}
}
Conclusion
In this chapter, we covered various class evaluation measures such as accuracy, precision, recall, F1-score, specificity, ROC-AUC, and how they can be calculated both mathematically and programmatically using Python and Java. These metrics are crucial in determining the effectiveness of a classification model and in comparing different models.
Comments
Post a Comment