Implemented Performance Measures
This page shows the performance measures available for the different types of learning problems as well as general performance measures in alphabetical order. (See also the documentation about measures and makeMeasure for available measures and their properties.)
If you find that a measure is missing, you can either open an issue or try to implement a measure yourself.
Column Minim. indicates if the measure is minimized during, e.g., tuning or feature selection. Best and Worst show the best and worst values the performance measure can attain. For classification, column Multi indicates if a measure is suitable for multi-class problems. If not, the measure can only be used for binary classification problems.
The next six columns refer to information required to calculate the performance measure.
- Pred.: The Prediction object.
- Truth: The true values of the response variable(s) (for supervised learning).
- Probs: The predicted probabilities (might be needed for classification).
- Model: The WrappedModel (e.g., for calculating the training time).
- Task: The Task (relevant for cost-sensitive classification).
- Feats: The predicted data (relevant for clustering).
Aggr. shows the default aggregation method tied to the measure.
Classification
ID / Name | Minim. | Best | Worst | Multi | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|---|
acc Accuracy |
1 | 0 | X | X | X | test.mean | ||||||
auc Area under the curve |
1 | 0 | X | X | X | test.mean | ||||||
bac Balanced accuracy |
1 | 0 | X | X | test.mean | Mean of true positive rate and true negative rate. | ||||||
ber Balanced error rate |
X | 0 | 1 | X | X | X | test.mean | Mean of misclassification error rates on all individual classes. | ||||
brier Brier score |
X | 0 | 1 | X | X | X | test.mean | |||||
f1 F1 measure |
1 | 0 | X | X | test.mean | |||||||
fdr False discovery rate |
X | 0 | 1 | X | X | test.mean | ||||||
fn False negatives |
X | 0 | Inf | X | X | test.mean | Also called misses. | |||||
fnr False negative rate |
X | 0 | 1 | X | X | test.mean | ||||||
fp False positives |
X | 0 | Inf | X | X | test.mean | Also called false alarms. | |||||
fpr False positive rate |
X | 0 | 1 | X | X | test.mean | Also called false alarm rate or fall-out. | |||||
gmean G-mean |
1 | 0 | X | X | test.mean | Geometric mean of recall and specificity. | ||||||
gpr Geometric mean of precision and recall |
1 | 0 | X | X | test.mean | |||||||
mcc Matthews correlation coefficient |
1 | -1 | X | X | test.mean | |||||||
mmce Mean misclassification error |
X | 0 | 1 | X | X | X | test.mean | |||||
multiclass.auc Multiclass area under the curve |
1 | 0 | X | X | X | X | test.mean | Calls pROC::multiclass.roc . |
||||
npv Negative predictive value |
1 | 0 | X | X | test.mean | |||||||
ppv Positive predictive value |
1 | 0 | X | X | test.mean | Also called precision. | ||||||
tn True negatives |
Inf | 0 | X | X | test.mean | Also called correct rejections. | ||||||
tnr True negative rate |
1 | 0 | X | X | test.mean | Also called specificity. | ||||||
tp True positives |
Inf | 0 | X | X | test.mean | |||||||
tpr True positive rate |
1 | 0 | X | X | test.mean | Also called hit rate or recall. |
Regression
ID / Name | Minim. | Best | Worst | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|
adjrsq Adjusted coefficient of determination |
1 | 0 | X | X | test.mean | Adjusted R-squared is only defined for normal linear regression | |||||
expvar Explained variance |
1 | 0 | X | X | test.mean | Similar to measaure rsq (R-squared). Defined as explained_sum_of_squares / total_sum_of_squares. | |||||
mae Mean of absolute errors |
X | 0 | Inf | X | X | test.mean | |||||
medae Median of absolute errors |
X | 0 | Inf | X | X | test.mean | |||||
medse Median of squared errors |
X | 0 | Inf | X | X | test.mean | |||||
mse Mean of squared errors |
X | 0 | Inf | X | X | test.mean | |||||
rmse Root mean square error |
X | 0 | Inf | X | X | test.rmse | The RMSE is aggregated as sqrt(mean(rmse.vals.on.test.sets^2)). If you don't want that, you could also use test.mean . |
||||
rsq Coefficient of determination |
1 | -Inf | X | X | test.mean | Also called R-squared, which is 1 - residual_sum_of_squares / total_sum_of_squares. | |||||
sae Sum of absolute errors |
X | 0 | Inf | X | X | test.mean | |||||
sse Sum of squared errors |
X | 0 | Inf | X | X | test.mean |
Survival analysis
ID / Name | Minim. | Best | Worst | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|
cindex Concordance index |
1 | 0 | X | X | test.mean |
Cluster analysis
ID / Name | Minim. | Best | Worst | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|
db Davies-Bouldin cluster separation measure |
X | 0 | Inf | X | X | test.mean | See ?clusterSim::index.DB . |
||||
dunn Dunn index |
Inf | 0 | X | X | test.mean | See ?clValid::dunn . |
|||||
G1 Calinski-Harabasz pseudo F statistic |
Inf | 0 | X | X | test.mean | See ?clusterSim::index.G1 . |
|||||
G2 Baker and Hubert adaptation of Goodman-Kruskal's gamma statistic |
Inf | 0 | X | X | test.mean | See ?clusterSim::index.G2 . |
|||||
silhouette Rousseeuw's silhouette internal cluster quality index |
Inf | 0 | X | X | test.mean | See ?clusterSim::index.S . |
Cost-sensitive classification
ID / Name | Minim. | Best | Worst | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|
mcp Misclassification penalty |
X | 0 | Inf | X | X | test.mean | Average difference between costs of oracle and model prediction. | ||||
meancosts Mean costs of the predicted choices |
X | 0 | Inf | X | X | test.mean |
Note that in case of ordinary misclassification costs you can also generate performance measures from cost matrices by function makeCostMeasure. For details see the section on cost-sensitive classification.
Multilabel classification
ID / Name | Minim. | Best | Worst | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|
hamloss Hamming loss |
X | 0 | 1 | X | X | test.mean |
General performance measures
ID / Name | Minim. | Best | Worst | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|
featperc Percentage of original features used for model |
X | 0 | 1 | X | X | test.mean | Useful for feature selection. | ||||
timeboth timetrain + timepredict |
X | 0 | Inf | X | X | test.mean | |||||
timepredict Time of predicting test set |
X | 0 | Inf | X | test.mean | ||||||
timetrain Time of fitting the model |
X | 0 | Inf | X | test.mean |