Hands-On Machine Learning with Microsoft Excel 2019
上QQ阅读APP看书,第一时间看更新

Calculating the Area Under Curve (AUC)

The AUC of a classification model is defined as the probability that the model will rank a random positive example above a random negative example.

Using the confusion matrix, we can define other quantities as follows:

The True Positive Rate (TPR) or sensitivity is the the ratio of data points correctly predicted as positive, with respect to all the data points that have a true value of YES:

The False Positive Rate (FPR) or specificity is the ratio of NO data points incorrectly predicted as YES, with respect to all NO data points.

Both quantities have values in the [0, 1] range. FPR and TPR are both computed at different threshold values and a graph is constructed. The curve is known as Receiving Operating Characteristic (ROC); AUC is the area under that curve, as shown in the following figure:

If we instead want to evaluate regression models, we can use the following techniques.