Posts

Machine Learning (Chapter 41): The ROC Curve

Image
  Machine Learning (Chapter 39): The ROC Curve Introduction to ROC Curve The ROC (Receiver Operating Characteristic) curve is a graphical representation used in binary classification problems to evaluate the performance of a classifier. It plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold levels. The curve illustrates the trade-offs between sensitivity (recall) and specificity, helping to visualize a classifier’s capability. A perfect classifier would have a curve that reaches the top-left corner of the plot, whereas a random classifier would result in a diagonal line from (0, 0) to (1, 1). Key Definitions Before diving into the mathematics, let's define key metrics: True Positive (TP) : Correctly predicted positives. False Positive (FP) : Incorrectly predicted positives. True Negative (TN) : Correctly predicted negatives. False Negative (FN) : Incorrectly predicted negatives. From these, we calculate: True Positive Rate (TPR) , also kn...

Machine Learning (Chapter 40): Class Evaluation Measures

Image
  Machine Learning (Chapter 39): Class Evaluation Measures In machine learning, particularly classification problems, evaluation metrics are critical for understanding how well a model performs. These metrics help in comparing models, tuning parameters, and determining which model fits best for a specific problem. In this chapter, we will focus on several widely-used class evaluation measures, along with the corresponding mathematical formulas and practical examples. 1. Accuracy Accuracy measures the proportion of correctly predicted instances out of all instances in the dataset. It is simple but can be misleading for imbalanced datasets. Accuracy = T P + T N T P + T N + F P + F N \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} Accuracy = TP + TN + FP + FN TP + TN ​ where: T P TP TP = True Positives T N TN TN = True Negatives F P FP FP = False Positives F N FN FN = False Negatives Example: Let’s calculate accuracy in Python. python Copy code from sklearn.metrics import ac...

Machine Learning (Chapter 39): Bootstrapping & Cross-Validation

Image
  Machine Learning (Chapter 39): Bootstrapping & Cross-Validation Introduction In the context of machine learning, evaluating the performance of models is a critical step. Two essential techniques widely used for model evaluation and resampling are Bootstrapping and Cross-Validation . Both help estimate the performance of a model by making efficient use of available data, reducing overfitting, and improving generalization. 1. Bootstrapping Bootstrapping is a resampling method where subsets of data are sampled with replacement. The goal is to create multiple training datasets, then build models for each dataset, and assess model variability and performance. Key Concept : When we sample with replacement, some observations may appear multiple times in the bootstrap sample, while others may not appear at all. Mathematical Foundation Given a dataset D = { x 1 , x 2 , … , x n } D = \{x_1, x_2, \dots, x_n\} D = { x 1 ​ , x 2 ​ , … , x n ​ } , a bootstrap sample D b D_b D b ​ is crea...