Integrated Learners
This page lists the learning methods already integrated in mlr.
Columns Num., Fac., Ord., NAs, and Weights indicate if a method can cope with numerical, factor, and ordered factor predictors, if it can deal with missing values in a meaningful way (other than simply removing observations with missing values) and if observation weights are supported.
Column Props shows further properties of the learning methods specific to the type of learning task. See also RLearner for details.
Classification (72)
For classification the following additional learner properties are relevant and shown in column Props:
- prob: The method can predict probabilities,
- oneclass, twoclass, multiclass: One-class, two-class (binary) or multi-class classification problems be handled,
- class.weights: Class weights can be handled.
Class / Short Name / Name | Packages | Num. | Fac. | Ord. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
classif.ada ada ada Boosting |
ada | X | X | X | prob twoclass |
xval has been set to 0 by default for speed. |
||
classif.avNNet avNNet Neural Network |
nnet | X | X | X | prob twoclass multiclass |
size has been set to 3 by default. Doing bagging training of nnet if set bag = TRUE . |
||
classif.bartMachine bartmachine Bayesian Additive Regression Trees |
bartMachine | X | X | X | prob twoclass |
use_missing_data has been set to TRUE by default to allow missing data support. |
||
classif.bdk bdk Bi-Directional Kohonen map |
kohonen | X | prob twoclass multiclass |
|||||
classif.binomial binomial Binomial Regression |
stats | X | X | X | prob twoclass |
Delegates to glm with freely choosable binomial link function via learner parameter link . |
||
classif.blackboost blackbst Gradient Boosting With Regression Trees |
mboost party |
X | X | X | X | prob twoclass |
See ?ctree_control for possible breakage for nominal features with missingness. |
|
classif.boosting adabag Adabag Boosting |
adabag rpart |
X | X | X | prob twoclass multiclass |
xval has been set to 0 by default for speed. |
||
classif.bst bst Gradient Boosting |
bst | X | twoclass | Renamed parameter learner to Learner due to nameclash with setHyperPars . Default changes: Learner = "ls" , xval = 0 , and maxdepth = 1 . |
||||
classif.cforest cforest Random forest based on conditional inference trees |
party | X | X | X | X | X | prob twoclass multiclass |
See ?ctree_control for possible breakage for nominal features with missingness. |
classif.clusterSVM clusterSVM Clustered Support Vector Machines |
SwarmSVM LiblineaR |
X | twoclass | centers set to 2 by default. |
||||
classif.ctree ctree Conditional Inference Trees |
party | X | X | X | X | X | prob twoclass multiclass |
See ?ctree_control for possible breakage for nominal features with missingness. |
classif.cvglmnet cvglmnet GLM with Lasso or Elasticnet Regularization (Cross Validated Lambda) |
glmnet | X | X | X | prob twoclass multiclass |
The family parameter is set to binomial for two-class problems and to multinomial otherwise. Factors automatically get converted to dummy columns, ordered factors to integer. |
||
classif.dbnDNN dbn.dnn Deep neural network with weights initialized by DBN |
deepnet | X | prob twoclass multiclass |
output set to "softmax" by default. |
||||
classif.dcSVM dcSVM Divided-Conquer Support Vector Machines |
SwarmSVM | X | twoclass | |||||
classif.extraTrees extraTrees Extremely Randomized Trees |
extraTrees | X | X | prob twoclass multiclass |
||||
classif.fnn fnn Fast k-Nearest Neighbour |
FNN | X | twoclass multiclass |
|||||
classif.gaterSVM gaterSVM Mixture of SVMs with Neural Network Gater Function |
SwarmSVM e1071 |
X | twoclass | m set to 3 and max.iter set to 1 by default. |
||||
classif.gbm gbm Gradient Boosting Machine |
gbm | X | X | X | X | prob twoclass multiclass |
Note on param 'distribution': gbm will select 'bernoulli' by default for 2 classes, and 'multinomial' for multiclass problems. The latter is the only setting that works for > 2 classes. | |
classif.geoDA geoda Geometric Predictive Discriminant Analysis |
DiscriMiner | X | twoclass multiclass |
|||||
classif.glmboost glmbst Boosting for GLMs |
mboost | X | X | X | prob twoclass |
family has been set to Binomial() by default. Maximum number of boosting iterations is set via mstop , the actual number used for prediction is controlled by m . |
||
classif.glmnet glmnet GLM with Lasso or Elasticnet Regularization |
glmnet | X | X | X | prob twoclass multiclass |
The family parameter is set to binomial for two-class problems and to multinomial otherwise. Factors automatically get converted to dummy columns, ordered factors to integer. |
||
classif.hdrda hdrda High-Dimensional Regularized Discriminant Analysis |
sparsediscrim | X | prob twoclass |
|||||
classif.IBk ibk k-Nearest Neighbours |
RWeka | X | X | prob twoclass multiclass |
||||
classif.J48 j48 J48 Decision Trees |
RWeka | X | X | X | prob twoclass multiclass |
NAs are directly passed to WEKA with na.action = na.pass . |
||
classif.JRip jrip Propositional Rule Learner |
RWeka | X | X | X | prob twoclass multiclass |
NAs are directly passed to WEKA with na.action = na.pass . |
||
classif.kknn kknn k-Nearest Neighbor |
kknn | X | X | prob twoclass multiclass |
||||
classif.knn knn k-Nearest Neighbor |
class | X | twoclass multiclass |
|||||
classif.ksvm ksvm Support Vector Machines |
kernlab | X | X | prob twoclass multiclass class.weights |
Kernel parameters have to be passed directly and not by using the kpar list in ksvm . Note that fit has been set to FALSE by default for speed. |
|||
classif.lda lda Linear Discriminant Analysis |
MASS | X | X | prob twoclass multiclass |
Learner parameter predict.method maps to method in predict.lda . |
|||
classif.LiblineaRL1L2SVC liblinl1l2svc L1-Regularized L2-Loss Support Vector Classification |
LiblineaR | X | twoclass multiclass class.weights |
|||||
classif.LiblineaRL1LogReg liblinl1logreg L1-Regularized Logistic Regression |
LiblineaR | X | prob twoclass multiclass class.weights |
|||||
classif.LiblineaRL2L1SVC liblinl2l1svc L2-Regularized L1-Loss Support Vector Classification |
LiblineaR | X | twoclass multiclass class.weights |
|||||
classif.LiblineaRL2LogReg liblinl2logreg L2-Regularized Logistic Regression |
LiblineaR | X | prob twoclass multiclass class.weights |
type = 0 (the default) is primal and type = 7 is dual problem. |
||||
classif.LiblineaRL2SVC liblinl2svc L2-Regularized L2-Loss Support Vector Classification |
LiblineaR | X | twoclass multiclass class.weights |
type = 2 (the default) is primal and type = 1 is dual problem. |
||||
classif.LiblineaRMultiClassSVC liblinmulticlasssvc Support Vector Classification by Crammer and Singer |
LiblineaR | X | twoclass multiclass class.weights |
|||||
classif.linDA linda Linear Discriminant Analysis |
DiscriMiner | X | twoclass multiclass |
Set validation = NULL by default to disable internal test set validation. |
||||
classif.logreg logreg Logistic Regression |
stats | X | X | X | prob twoclass |
Delegates to glm with family = binomial(link = "logit") . |
||
classif.lqa lqa Fitting penalized Generalized Linear Models with the LQA algorithm |
lqa | X | X | prob twoclass |
penalty has been set to "lasso" and lambda to 0.1 by default. |
|||
classif.lssvm lssvm Least Squares Support Vector Machine |
kernlab | X | X | twoclass multiclass |
fitted has been set to FALSE by default for speed. |
|||
classif.lvq1 lvq1 Learning Vector Quantization |
class | X | twoclass multiclass |
|||||
classif.mda mda Mixture Discriminant Analysis |
mda | X | X | prob twoclass multiclass |
keep.fitted has been set to FALSE by default for speed and we use start.method = "lvq" for more robust behavior / less technical crashes. |
|||
classif.mlp mlp Multi-Layer Perceptron |
RSNNS | X | prob twoclass multiclass |
|||||
classif.multinom multinom Multinomial Regression |
nnet | X | X | X | prob twoclass multiclass |
|||
classif.naiveBayes nbayes Naive Bayes |
e1071 | X | X | X | prob twoclass multiclass |
|||
classif.neuralnet neuralnet Neural Network from neuralnet |
neuralnet | X | prob twoclass |
err.fct has been set to ce to do classification. |
||||
classif.nnet nnet Neural Network |
nnet | X | X | X | prob twoclass multiclass |
size has been set to 3 by default. |
||
classif.nnTrain nn.train Training Neural Network by Backpropagation |
deepnet | X | prob twoclass multiclass |
output set to softmax by default. |
||||
classif.nodeHarvest nodeHarvest Node Harvest |
nodeHarvest | X | X | prob twoclass |
||||
classif.OneR oner 1-R Classifier |
RWeka | X | X | X | prob twoclass multiclass |
NAs are directly passed to WEKA with na.action = na.pass . |
||
classif.pamr pamr Nearest shrunken centroid |
pamr | X | prob twoclass |
Threshold for prediction (threshold.predict ) has been set to 1 by default. |
||||
classif.PART part PART Decision Lists |
RWeka | X | X | X | prob twoclass multiclass |
NAs are directly passed to WEKA with na.action = na.pass . |
||
classif.plr plr Logistic Regression with a L2 Penalty |
stepPlr | X | X | X | prob twoclass |
AIC and BIC penalty types can be selected via the new parameter cp.type . |
||
classif.plsdaCaret plsdacaret Partial Least Squares (PLS) Discriminant Analysis |
caret | X | prob twoclass |
|||||
classif.probit probit Probit Regression |
stats | X | X | X | prob twoclass |
Delegates to glm with family = binomial(link = "probit") . |
||
classif.qda qda Quadratic Discriminant Analysis |
MASS | X | X | prob twoclass multiclass |
Learner parameter predict.method maps to method in predict.qda . |
|||
classif.quaDA quada Quadratic Discriminant Analysis |
DiscriMiner | X | twoclass multiclass |
|||||
classif.randomForest rf Random Forest |
randomForest | X | X | X | prob twoclass multiclass class.weights |
Note that the rf can freeze the R process if trained on a task with 1 feature which is constant. This can happen in feature forward selection, also due to resampling, and you need to remove such features with removeConstantFeatures. | ||
classif.randomForestSRC rfsrc Random Forest |
randomForestSRC | X | X | X | prob twoclass multiclass |
na.action has been set to na.impute by default to allow missing data support. |
||
classif.randomForestSRCSyn rfsrcSyn Synthetic Random Forest |
randomForestSRC | X | X | X | prob twoclass multiclass |
na.action' has been set to 'na.impute' by default to allow missing data support | ||
classif.ranger ranger Random Forests |
ranger | X | X | prob twoclass multiclass |
By default, internal parallelization is switched off (num.threads = 1 ) and verbose output is disabled. Both settings are changeable. |
|||
classif.rda rda Regularized Discriminant Analysis |
klaR | X | X | prob twoclass multiclass |
estimate.error has been set to FALSE by default for speed. |
|||
classif.rFerns rFerns Random ferns |
rFerns | X | X | X | twoclass multiclass |
|||
classif.rknn rknn Random k-Nearest-Neighbors |
rknn | X | X | twoclass multiclass |
||||
classif.rotationForest rotationForest Rotation Forest |
rotationForest | X | X | X | prob twoclass |
|||
classif.rpart rpart Decision Tree |
rpart | X | X | X | X | X | prob twoclass multiclass |
xval has been set to 0 by default for speed. |
classif.rrlda rrlda Robust Regularized Linear Discriminant Analysis |
rrlda | X | twoclass multiclass |
|||||
classif.saeDNN sae.dnn Deep neural network with weights initialized by Stacked AutoEncoder |
deepnet | X | prob twoclass multiclass |
output set to "softmax" by default. |
||||
classif.sda sda Shrinkage Discriminant Analysis |
sda | X | prob twoclass multiclass |
|||||
classif.sparseLDA sparseLDA Sparse Discriminant Analysis |
sparseLDA MASS elasticnet |
X | prob twoclass multiclass |
Arguments Q and stop are not yet provided as they depend on the task. |
||||
classif.svm svm Support Vector Machines (libsvm) |
e1071 | X | X | prob twoclass multiclass class.weights |
||||
classif.xgboost xgboost eXtreme Gradient Boosting |
xgboost | X | X | X | prob twoclass multiclass |
All settings are passed directly, rather than through xgboost 's params argument. nrounds has been set to 1 by default. |
||
classif.xyf xyf X-Y fused self-organising maps |
kohonen | X | prob twoclass multiclass |
Regression (53)
Additional learner properties:
- se: Standard errors can be predicted.
Class / Short Name / Name | Packages | Num. | Fac. | Ord. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
regr.avNNet avNNet Neural Network |
nnet | X | X | X | size has been set to 3 by default. |
|||
regr.bartMachine bartmachine Bayesian Additive Regression Trees |
bartMachine | X | X | X | use_missing_data has been set to TRUE by default to allow missing data support. |
|||
regr.bcart bcart Bayesian CART |
tgp | X | X | se | ||||
regr.bdk bdk Bi-Directional Kohonen map |
kohonen | X | ||||||
regr.bgp bgp Bayesian Gaussian Process |
tgp | X | se | |||||
regr.bgpllm bgpllm Bayesian Gaussian Process with jumps to the Limiting Linear Model |
tgp | X | se | |||||
regr.blackboost blackbst Gradient Boosting with Regression Trees |
mboost party |
X | X | X | X | See ?ctree_control for possible breakage for nominal features with missingness. |
||
regr.blm blm Bayesian Linear Model |
tgp | X | se | |||||
regr.brnn brnn Bayesian regularization for feed-forward neural networks |
brnn | X | X | |||||
regr.bst bst Gradient Boosting |
bst | X | Renamed parameter learner to Learner due to nameclash with setHyperPars . Default changes: Learner = "ls" , xval = 0 , and maxdepth = 1 . |
|||||
regr.btgp btgp Bayesian Treed Gaussian Process |
tgp | X | X | se | ||||
regr.btgpllm btgpllm Bayesian Treed Gaussian Process with jumps to the Limiting Linear Model |
tgp | X | X | se | ||||
regr.btlm btlm Bayesian Treed Linear Model |
tgp | X | X | se | ||||
regr.cforest cforest Random Forest Based on Conditional Inference Trees |
party | X | X | X | X | X | See ?ctree_control for possible breakage for nominal features with missingness. |
|
regr.crs crs Regression Splines |
crs | X | X | X | se | |||
regr.ctree ctree Conditional Inference Trees |
party | X | X | X | X | X | See ?ctree_control for possible breakage for nominal features with missingness. |
|
regr.cubist cubist Cubist |
Cubist | X | X | X | ||||
regr.earth earth Multivariate Adaptive Regression Splines |
earth | X | X | |||||
regr.elmNN elmNN Extreme Learning Machine for Single Hidden Layer Feedforward Neural Networks |
elmNN | X | nhid has been set to 1 and actfun has been set to "sig" by default. |
|||||
regr.extraTrees extraTrees Extremely Randomized Trees |
extraTrees | X | X | |||||
regr.fnn fnn Fast k-Nearest Neighbor |
FNN | X | ||||||
regr.frbs frbs Fuzzy Rule-based Systems |
frbs | X | ||||||
regr.gbm gbm Gradient Boosting Machine |
gbm | X | X | X | X | distribution has been set to "gaussian" by default. |
||
regr.glmboost glmboost Boosting for GLMs |
mboost | X | X | X | Maximum number of boosting iterations is set via mstop , the actual number used is controlled by m . |
|||
regr.glmnet glmnet GLM with Lasso or Elasticnet Regularization |
glmnet | X | X | X | X | Factors automatically get converted to dummy columns, ordered factors to integer. | ||
regr.IBk ibk K-Nearest Neighbours |
RWeka | X | X | |||||
regr.kknn kknn K-Nearest-Neighbor regression |
kknn | X | X | |||||
regr.km km Kriging |
DiceKriging | X | se | In predict, we currently always use type = "SK" . The extra parameter jitter (default is FALSE ) enables adding a very small jitter (order 1e-12) to the x-values before prediction, as predict.km reproduces the exact y-values of the training data points, when you pass them in, even if the nugget effect is turned on. |
||||
regr.ksvm ksvm Support Vector Machines |
kernlab | X | X | Kernel parameters have to be passed directly and not by using the kpar list in ksvm . Note that fit has been set to FALSE by default for speed. |
||||
regr.laGP laGP Local Approximate Gaussian Process |
laGP | X | se | |||||
regr.LiblineaRL2L1SVR liblinl2l1svr L2-Regularized L1-Loss Support Vector Regression |
LiblineaR | X | Parameter svr_eps has been set to 0.1 by default. |
|||||
regr.LiblineaRL2L2SVR liblinl2l2svr L2-Regularized L2-Loss Support Vector Regression |
LiblineaR | X | type = 11 (the default) is primal and type = 12 is dual problem. Parameter svr_eps has been set to 0.1 by default. |
|||||
regr.lm lm Simple Linear Regression |
stats | X | X | X | se | |||
regr.mars mars Multivariate Adaptive Regression Splines |
mda | X | ||||||
regr.mob mob Model-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Node |
party | X | X | X | ||||
regr.nnet nnet Neural Network |
nnet | X | X | X | size has been set to 3 by default. |
|||
regr.nodeHarvest nodeHarvest Node Harvest |
nodeHarvest | X | X | |||||
regr.pcr pcr Principal Component Regression |
pls | X | X | |||||
regr.penalized.lasso lasso Lasso Regression |
penalized | X | X | |||||
regr.penalized.ridge ridge Penalized Ridge Regression |
penalized | X | X | |||||
regr.plsr plsr Partial Least Squares Regression |
pls | X | X | |||||
regr.randomForest rf Random Forest |
randomForest | X | X | X | se | See ?regr.randomForest for information about se estimation. Note that the rf can freeze the R process if trained on a task with 1 feature which is constant. This can happen in feature forward selection, also due to resampling, and you need to remove such features with removeConstantFeatures. |
||
regr.randomForestSRC rfsrc Random Forest |
randomForestSRC | X | X | X | na.action has been set to na.impute by default to allow missing data support. |
|||
regr.randomForestSRCSyn rfsrcSyn Synthetic Random Forest |
randomForestSRC | X | X | X | na.action' has been set to 'na.impute' by default to allow missing data support | |||
regr.ranger ranger Random Forests |
ranger | X | X | By default, internal parallelization is switched off (num.threads = 1 ) and verbose output is disabled. Both settings are changeable. |
||||
regr.rknn rknn Random k-Nearest-Neighbors |
rknn | X | X | |||||
regr.rpart rpart Decision Tree |
rpart | X | X | X | X | X | xval has been set to 0 by default for speed. |
|
regr.rsm rsm Response Surface Regression |
rsm | X | You select the order of the regression by using modelfun = "FO" (first order), "TWI" (two-way interactions, this is with 1st oder terms!) and "SO" (full second order). |
|||||
regr.rvm rvm Relevance Vector Machine |
kernlab | X | X | Kernel parameters have to be passed directly and not by using the kpar list in rvm . Note that fit has been set to FALSE by default for speed. |
||||
regr.slim slim Sparse Linear Regression using Nonsmooth Loss Functions and L1 Regularization |
flare | X | lambda.idx has been set to 3 by default. |
|||||
regr.svm svm Support Vector Machines (libsvm) |
e1071 | X | X | |||||
regr.xgboost xgboost eXtreme Gradient Boosting |
xgboost | X | X | X | All settings are passed directly, rather than through xgboost 's params argument. nrounds has been set to 1 by default. |
|||
regr.xyf xyf X-Y fused self-organising maps |
kohonen | X |
Survival analysis (11)
Additional learner properties:
- prob: Probabilities can be predicted,
- rcens, lcens, icens: The learner can handle right, left and/or interval censored data.
Class / Short Name / Name | Packages | Num. | Fac. | Ord. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
surv.cforest crf Random Forest based on Conditional Inference Trees |
party survival |
X | X | X | X | X | rcens | See ?ctree_control for possible breakage for nominal features with missingness. |
surv.CoxBoost coxboost Cox Proportional Hazards Model with Componentwise Likelihood based Boosting |
CoxBoost | X | X | X | X | rcens | Factors automatically get converted to dummy columns, ordered factors to integer. | |
surv.coxph coxph Cox Proportional Hazard Model |
survival | X | X | X | X | prob rcens |
||
surv.cvglmnet cvglmnet GLM with Regularization (Cross Validated Lambda) |
glmnet | X | X | X | X | rcens | Factors automatically get converted to dummy columns, ordered factors to integer. | |
surv.glmboost glmboost Gradient Boosting with Componentwise Linear Models |
survival mboost |
X | X | X | X | rcens | family has been set to CoxPH() by default. Maximum number of boosting iterations is set via mstop , the actual number used for prediction is controlled by m . |
|
surv.glmnet glmnet GLM with Regularization |
glmnet | X | X | X | X | rcens | Factors automatically get converted to dummy columns, ordered factors to integer. | |
surv.optimCoxBoostPenalty optimCoxBoostPenalty Cox Proportional Hazards Model with Componentwise Likelihood based Boosting, automatic tuning enabled |
CoxBoost | X | X | X | rcens | Factors automatically get converted to dummy columns, ordered factors to integer. | ||
surv.penalized penalized Penalized Regression |
penalized | X | X | X | rcens | Factors automatically get converted to dummy columns, ordered factors to integer. | ||
surv.randomForestSRC rfsrc Random Forests for Survival |
survival randomForestSRC |
X | X | X | X | rcens | 'na.action' has been set to 'na.impute' by default to allow missing data support | |
surv.ranger ranger Random Forests |
ranger | X | X | prob rcens |
By default, internal parallelization is switched off (num.threads = 1 ) and verbose output is disabled. Both settings are changeable. |
|||
surv.rpart rpart Survival Tree |
rpart | X | X | X | X | X | rcens | xval has been set to 0 by default for speed. |
Cluster analysis (8)
Additional learner properties:
- prob: Probabilities can be predicted.
Class / Short Name / Name | Packages | Num. | Fac. | Ord. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
cluster.cmeans cmeans Fuzzy C-Means Clustering |
e1071 clue |
X | prob | The predict method uses cl_predict from the clue package to compute the cluster memberships for new data. The default centers = 2 is added so the method runs without setting parameters, but this must in reality of course be changed by the user. |
||||
cluster.Cobweb cobweb Cobweb Clustering Algorithm |
RWeka | X | ||||||
cluster.dbscan dbscan DBScan Clustering |
fpc | X | A cluster index of NA indicates noise points. Specify method = "dist" if the data should be interpreted as dissimilarity matrix or object. Otherwise Euclidean distances will be used. |
|||||
cluster.EM em Expectation-Maximization Clustering |
RWeka | X | ||||||
cluster.FarthestFirst farthestfirst FarthestFirst Clustering Algorithm |
RWeka | X | ||||||
cluster.kmeans kmeans K-Means |
stats clue |
X | prob | The predict method uses cl_predict from the clue package to compute the cluster memberships for new data. The default centers = 2 is added so the method runs without setting parameters, but this must in reality of course be changed by the user. |
||||
cluster.SimpleKMeans simplekmeans K-Means Clustering |
RWeka | X | ||||||
cluster.XMeans xmeans XMeans (k-means with automatic determination of k) |
RWeka | X | You may have to install the XMeans Weka package: WPM('install-package', 'XMeans') . |
Cost-sensitive classification
For ordinary misclassification costs you can use all the standard classification methods listed above.
For example-dependent costs there are several ways to generate cost-sensitive learners from ordinary regression and classification learners. See section cost-sensitive classification and the documentation of makeCostSensClassifWrapper, makeCostSensRegrWrapper and makeCostSensWeightedPairsWrapper for details.
Multilabel classification (1)
Class / Short Name / Name | Packages | Num. | Fac. | Ord. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
multilabel.rFerns rFerns Random ferns |
rFerns | X | X | X |
Moreover, you can use the binary relevance method to apply ordinary classification learners to the multilabel problem. See the documentation of function makeMultilabelBinaryRelevanceWrapper and the tutorial section on multilabel classification for details.