ROC Analysis and Performance Curves
For binary scoring classifiers a threshold (in the following also called cutoff) value controls how predicted posterior probabilities are turned into class labels. ROC curves and other performance plots serve to visualize and analyse the relationship between one or two performance measures and the threshold.
This section is mainly devoted to receiver operating characteristic (ROC) curves that plot the true positive rate (sensitivity) on the vertical axis against the false positive rate (1 - specificity, fall-out) on the horizontal axis for all possible threshold values. Creating other performance plots like lift charts or precision/recall graphs works analogously and is only shown briefly.
In addition to performance visualization ROC curves are helpful in
- determining an optimal decision threshold for given class prior probabilities and misclassification costs (for alternatives see also the sections about cost-sensitive classification and imbalanced classification problems in this tutorial),
- identifying regions where one classifier outperforms another and building suitable multi-classifier systems,
- obtaining calibrated estimates of the posterior probabilities.
For more information see the tutorials and introductory papers by Fawcett (2004), Fawcett (2006) as well as Flach (ICML 2004).
In many applications as, e.g., diagnostic tests or spam detection, there is uncertainty about the class priors or the misclassification costs at the time of prediction, for example because it's hard to quantify the costs or because costs and class priors vary over time. Under these circumstances the classifier is expected to work well for a whole range of decision thresholds and the area under the ROC curve (AUC) provides a scalar performance measure for comparing and selecting classifiers. mlr provides the AUC for binary classification (auc based on package ROCR) an also a generalization of the AUC for the multi-class case (multiclass.auc based on package pROC).
mlr offers three ways to plot ROC and other performance curves.
- mlr's function generateROCRCurvesData is a convenient interface to ROCR's performance methods with an associated plotting function, plotROCRCurves which uses ggplot2.
- The mlr function asROCRPrediction converts mlr Prediction objects to objects of ROCR's class prediction. Then, ROCR's functionality can be used to further analyse the results and generate performance plots.
- mlr's function plotViperCharts provides an interface to ViperCharts.
Let's have a look at some examples demonstrating the three possible methods.
Note that the learners have to be capable of predicting probabilities.
Have a look at the table of learners
or run listLearners(prob = TRUE)
to get a list of all learners that support this.
Performance plots with generateROCRCurvesData and plotROCRCurves
As mentioned above generateROCRCurvesData is an interface to ROCR's performance methods. It provides S3 methods for objects of class Prediction, ResampleResult and BenchmarkResult (resulting from calling predict, resample or benchmark). plotROCRCurves plots output from generateROCRCurvesData using ggplot2.
Example 1: Single predictions
We consider the Sonar data set from package mlbench, which poses a binary classification problem (sonar.task) and apply linear discriminant analysis.
n = getTaskSize(sonar.task)
train.set = sample(n, size = round(2/3 * n))
test.set = setdiff(seq_len(n), train.set)
lrn1 = makeLearner("classif.lda", predict.type = "prob")
mod1 = train(lrn1, sonar.task, subset = train.set)
pred1 = predict(mod1, task = sonar.task, subset = test.set)
roc_data = generateROCRCurvesData(pred1)
roc_data
#> learner False positive rate True positive rate Cutoff
#> 1 prediction 0 0.00000000 1.0147059
#> 2 prediction 0 0.02702703 1.0000000
#> 3 prediction 0 0.05405405 0.9999999
#> 4 prediction 0 0.08108108 0.9999999
#> 5 prediction 0 0.10810811 0.9999999
generateROCRCurvesData returns an object of class "ROCRCurvesData" which contains the results from ROCR's performance method (which depends on arguments meas1
and meas2
). The data can be extracted by accessing the data element of the object. The object also contains information about the input arguments to generateROCRCurvesData
which may be useful.
Per default, plotROCRCurves draws a ROC curve and optionally adds a diagonal line that represents the performance of a random classifier.
droc = generateROCRCurvesData(pred1)
plotROCRCurves(droc, diagonal = TRUE)
There is also an experimental plotting function plotROCRCurvesGGVIS which uses ggvis to create similar
figures with the addition of (optional) interactive tooltips (displayed on hover) that display the threshold
at that point in the curve.
plotROCRCurvesGGVIS(droc, cutoffs = TRUE)
The corresponding area under curve (auc) can be calculated as usual by calling performance.
performance(pred1, auc)
#> auc
#> 0.847973
In addition to linear discriminant analysis we try a support vector machine with RBF kernel (ksvm).
lrn2 = makeLearner("classif.ksvm", predict.type = "prob")
mod2 = train(lrn2, sonar.task, subset = train.set)
pred2 = predict(mod2, task = sonar.task, subset = test.set)
In order to compare the performance of the two learners you might want to display the two corresponding ROC curves in one plot. For this purpose just pass a named list of Predictions to plotROCRCurves.
plotROCRCurves(generateROCRCurvesData(list(lda = pred1, ksvm = pred2)))
It's clear from the plot above that ksvm has a slightly higher AUC than lda.
performance(pred2, auc)
#> auc
#> 0.9214527
It is easily possible to generate other performance plots by passing the appropriate performance
measures to plotROCRCurves.
Note that arguments meas1
and meas2
do not refer to mlr's performance measures,
but to measures provided by ROCR and listed here.
Below is code for a lift chart which shows the lift value ("lift"
) versus the rate of
positive predictions ("rpp"
).
out = generateROCRCurvesData(list(lda = pred1, ksvm = pred2), meas1 = "lift", meas2 = "rpp")
plotROCRCurves(out)
A plot of a single performance measure (accuracy in the example code below) versus the
threshold can be generated by setting meas2 = "cutoff"
.
out = generateROCRCurvesData(list(lda = pred1, ksvm = pred2), meas1 = "acc", meas2 = "cutoff")
plotROCRCurves(out)
As you may recall, an alternative function for plotting performance values versus the decision threshold is plotThreshVsPerf. While plotThreshVsPerf permits to plot several performance measures at once, plotROCRCurves makes it easy to superpose the performance curves of multiple learners.
Example 2: Benchmark experiment
The analysis in the example above can be improved in several regards. We only considered the training performance and, ideally, the support vector machine should have been tuned. Moreover, we wrote individual code for training/prediction of each learner, which can become tedious very quickly. A more practical way to apply several learners to a Task and compare their performance is provided by function benchmark (see also Benchmark Experiments).
We again consider the Sonar data set and apply lda as well as ksvm. We first generate a tuning wrapper for ksvm. The cost parameter is tuned on a (for demonstration purposes small) parameter grid. We assume that we are interested in a good performance over the complete threshold range and therefore tune with regard to the auc. The error rate (mmce) for threshold 0.5 is reported as well.
## Tune wrapper for ksvm
rdesc.inner = makeResampleDesc("Holdout")
ms = list(auc, mmce)
ps = makeParamSet(
makeDiscreteParam("C", 2^(-1:1))
)
ctrl = makeTuneControlGrid()
lrn2 = makeTuneWrapper(lrn2, rdesc.inner, ms, ps, ctrl, show.info = FALSE)
Below the actual benchmark experiment is conducted. As resampling strategy we use 5-fold cross-validation and again calculate the auc as well as the error rate (for a threshold/cutoff value of 0.5).
## Benchmark experiment
lrns = list(lrn1, lrn2)
rdesc.outer = makeResampleDesc("CV", iters = 5)
res = benchmark(lrns, tasks = sonar.task, resampling = rdesc.outer, measures = ms, show.info = FALSE)
res
#> task.id learner.id auc.test.mean mmce.test.mean
#> 1 Sonar-example classif.lda 0.7835442 0.2592334
#> 2 Sonar-example classif.ksvm.tuned 0.9454418 0.1390244
Calling plotROCRCurves on the result of the benchmark experiment produces a plot with ROC curves for all learners in the experiment.
plotROCRCurves(generateROCRCurvesData(res))
Per default, threshold-averaged ROC curves are shown. Since we used 5-fold cross-validation we have predictions on 5 test data sets and therefore 5 ROC curves per classifier. For each threshold value the means of the corresponding 5 false and true positive rates are calculated and plotted against each other.
If you want to plot the individual ROC curves for each resample iteration set avg = "none"
.
Other available options are avg = "horizontal"
and avg = "vertical"
.
plotROCRCurves(generateROCRCurvesData(res, avg = "none"))
An alternative to averaging is to just merge the 5 test folds and draw a single ROC curve. Merging can be achieved by manually changing the class attribute of the prediction objects from ResamplePrediction to Prediction.
Below the predictions are extracted from the BenchmarkResult via function getBMRPredictions, the class is changed and the ROC curves are created.
Averaging methods are normally preferred (cp. Fawcett, 2006), as they permit to assess the variability, which is needed to properly compare classifier performance.
## Extract predictions
preds = getBMRPredictions(res)[[1]]
## Change the class attribute
preds2 = lapply(preds, function(x) {class(x) = "Prediction"; return(x)})
## Draw ROC curves
plotROCRCurves(generateROCRCurvesData(preds2, avg = "none"))
Again, you can easily create other standard evaluation plots by calling plotROCRCurves on the BenchmarkResult with the appropriate performance measures (see ROCR::performance).
Performance plots with asROCRPrediction
Drawing performance plots with package ROCR works through three basic commands:
- ROCR::prediction: Create a ROCR prediction object.
- ROCR::performance: Calculate one or more performance measures for the given prediction object.
- A reimplementation of ROCR::plot which uses ggplot2.
mlr's function asROCRPrediction converts an mlr Prediction object to a ROCR prediction object. In order to create performance plots steps 2. and 3. have to be run by the user.
This is obviously less convenient than calling plotROCRCurves (which extracts predictions,
calls asROCRPrediction and executes steps 2. and 3. internally).
On the other hand this way provides more control over the generated plots by, e.g., using graphical
parameters that are not (yet) accessible via plotROCRCurves.
Moreover, you can directly benefit from any enhancements in ROCR, use your own
ROCR-based code or other packages that depend on ROCR, and use ROCR's plot.
For more details see the ROCR documentation and demo(ROCR)
.
An addditional alternative is to call plotROCRCurves, extract the data from the ggplot2 object using the data
element of the object, e.g., obj$data
, and then plot the data using whatever method you prefer.
Example 1: Single predictions (continued)
We go back to out first example where we trained and predicted lda on the sonar classification task.
n = getTaskSize(sonar.task)
train.set = sample(n, size = round(2/3 * n))
test.set = setdiff(seq_len(n), train.set)
## Train and predict linear discriminant analysis
lrn1 = makeLearner("classif.lda", predict.type = "prob")
mod1 = train(lrn1, sonar.task, subset = train.set)
pred1 = predict(mod1, task = sonar.task, subset = test.set)
Below we use asROCRPrediction to convert the lda prediction, let ROCR calculate the true and false positive rate and plot the ROC curve.
## Convert prediction
ROCRpred1 = asROCRPrediction(pred1)
## Calculate true and false positive rate
ROCRperf1 = ROCR::performance(ROCRpred1, "tpr", "fpr")
## Draw ROC curve
ROCR::plot(ROCRperf1)
Below is the same ROC curve, but we make use of some more graphical parameters: The ROC curve is color-coded by the threshold and selected threshold values are printed on the curve. Additionally, the convex hull (black broken line) of the ROC curve is drawn.
## Draw ROC curve
ROCR::plot(ROCRperf1, colorize = TRUE, print.cutoffs.at = seq(0.1, 0.9, 0.1), lwd = 2)
## Draw convex hull of ROC curve
ch = ROCR::performance(ROCRpred1, "rch")
ROCR::plot(ch, add = TRUE, lty = 2)
Example 2: Benchmark experiments (continued)
We again consider the benchmark experiment conducted earlier. We first extract the predictions by getBMRPredictions and then convert them via function asROCRPrediction.
## Extract predictions
preds = getBMRPredictions(res)[[1]]
## Convert predictions
ROCRpreds = lapply(preds, asROCRPrediction)
## Calculate true and false positive rate
ROCRperfs = lapply(ROCRpreds, function(x) ROCR::performance(x, "tpr", "fpr"))
We draw the horizontally averaged ROC curves (solid lines) as well as the ROC curves for the individual resampling iterations (broken lines). Moreover, standard error bars are plotted for selected true positive rates (0.1, 0.2, ..., 0.9). See ROCR's plot function for details.
## lda average ROC curve
plot(ROCRperfs[[1]], col = "blue", avg = "horizontal", spread.estimate = "stderror",
show.spread.at = seq(0.1, 0.9, 0.1), plotCI.col = "blue", plotCI.lwd = 2, lwd = 2)
## lda individual ROC curves
plot(ROCRperfs[[1]], col = "blue", lty = 2, lwd = 0.25, add = TRUE)
## ksvm average ROC curve
plot(ROCRperfs[[2]], col = "red", avg = "horizontal", spread.estimate = "stderror",
show.spread.at = seq(0.4, 0.9, 0.1), plotCI.col = "red", plotCI.lwd = 2, lwd = 2, add = TRUE)
## ksvm individual ROC curves
plot(ROCRperfs[[2]], col = "red", lty = 2, lwd = 0.25, add = TRUE)
legend("bottomright", legend = getBMRLearnerIds(res), lty = 1, lwd = 2, col = c("blue", "red"))
In order to create other evaluation plots like precision/recall graphs you just have to change the performance measures when calling ROCR::performance.
## Extract and convert predictions
preds = getBMRPredictions(res)[[1]]
ROCRpreds = lapply(preds, asROCRPrediction)
## Calculate precision and recall
ROCRperfs = lapply(ROCRpreds, function(x) ROCR::performance(x, "prec", "rec"))
## Draw performance plot
plot(ROCRperfs[[1]], col = "blue", avg = "threshold")
plot(ROCRperfs[[2]], col = "red", avg = "threshold", add = TRUE)
legend("bottomleft", legend = getBMRLearnerIds(res), lty = 1, col = c("blue", "red"))
If you want to plot a performance measure versus the threshold, specify only one measure when calling ROCR::performance. Below the average accuracy over the 5 cross-validation iterations is plotted against the threshold. Moreover, boxplots for certain threshold values (0.1, 0.2, ..., 0.9) are drawn.
## Extract and convert predictions
preds = getBMRPredictions(res)[[1]]
ROCRpreds = lapply(preds, asROCRPrediction)
## Calculate accuracy
ROCRperfs = lapply(ROCRpreds, function(x) ROCR::performance(x, "acc"))
## Plot accuracy versus threshold
plot(ROCRperfs[[1]], avg = "vertical", spread.estimate = "boxplot", lwd = 2, col = "blue",
show.spread.at = seq(0.1, 0.9, 0.1), ylim = c(0,1), xlab = "Threshold")
Viper charts
mlr also supports ViperCharts for plotting ROC and other performance curves. Like plotROCRCurves it has S3 methods for objects of class Prediction, ResampleResult and BenchmarkResult. Below plots for the benchmark experiment (Example 2) are generated.
z = plotViperCharts(res, chart = "rocc", browse = FALSE)
You can see the plot created this way here. Note that besides ROC curves you get several other plots like lift charts or cost curves. For details, see plotViperCharts.