Integrated Learners

Here, the learning methods already integrated into mlr are listed.

Columns Num., Fac., NAs, and Weights indicate if a method can cope with numerical and factor predictors, if NAs in the data are allowed and if observation weights are supported, respectively.

Column Props shows further properties of the learning methods. ordered indicates that a method can deal with ordered factor features. For classification, you can see if binary and/or multi-class problems are supported. For survival analysis, the censoring type is shown. For example rcens means that the learning method can deal with right censored data. Moreover, the type of prediction is displayed, where prob indicates that probabilities can be predicted. For regression, se means that standard errors and the mean response can be predicted.

Classification (54)

ID / Short Name Name Packages Num. Fac. NAs Weights Props Note
classif.ada
ada
ada Boosting ada X X X prob
twoclass
classif.bartMachine
bartmachine
Bayesian Additive Regression Trees bartMachine X X X prob
twoclass
'use_missing_data' has been set to TRUE by default to allow missing data support
classif.bdk
bdk
Bi-Directional Kohonen map kohonen X multiclass
prob
twoclass
classif.binomial
binomial
Binomial Regression stats X X X prob
twoclass
Delegates to glm with freely choosable binomial link function via learner param 'link'.
classif.blackboost
blackbst
Gradient Boosting With Regression Trees mboost
party
X X X X prob
twoclass
see ?ctree_control for possible breakage for nominal features with missingness
classif.boosting
adabag
Adabag Boosting adabag
rpart
X X X multiclass
prob
twoclass
xval has been set to 0 by default for speed.
classif.bst
bst
Gradient Boosting bst X twoclass The argument learner has been renamed to Learner due to a name conflict with setHyerPars. Learner has been set to lm by default.
classif.cforest
cforest
Random forest based on conditional inference trees party X X X X multiclass
ordered
prob
twoclass
see ?ctree_control for possible breakage for nominal features with missingness
classif.ctree
ctree
Conditional Inference Trees party X X X X multiclass
ordered
prob
twoclass
see ?ctree_control for possible breakage for nominal features with missingness
classif.extraTrees
extraTrees
Extremely Randomized Trees extraTrees X X multiclass
prob
twoclass
classif.fnn
fnn
Fast k-Nearest Neighbour FNN X multiclass
twoclass
classif.gbm
gbm
Gradient Boosting Machine gbm X X X X multiclass
prob
twoclass
classif.geoDA
geoda
Geometric Predictive Discriminant Analysis DiscriMiner X multiclass
twoclass
classif.glmboost
glmbst
Boosting for GLMs mboost X X X prob
twoclass
family has been set to Binomial() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'.
classif.glmnet
glmnet
GLM with Lasso or Elasticnet Regularization glmnet X X X multiclass
prob
twoclass
Factors automatically get converted to dummy columns, ordered factors to integer
classif.hdrda
hdrda
High-Dimensional Regularized Discriminant Analysis sparsediscrim X prob
twoclass
classif.IBk
ibk
k-Nearest Neighbours RWeka X X multiclass
prob
twoclass
classif.J48
j48
J48 Decision Trees RWeka X X X multiclass
prob
twoclass
NAs are directly passed to WEKA with na.action = na.pass
classif.JRip
jrip
Propositional Rule Learner RWeka X X X multiclass
prob
twoclass
NAs are directly passed to WEKA with na.action = na.pass
classif.kknn
kknn
k-Nearest Neighbor kknn X X multiclass
prob
twoclass
classif.knn
knn
k-Nearest Neighbor class X multiclass
twoclass
classif.ksvm
ksvm
Support Vector Machines kernlab X X multiclass
prob
twoclass
Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed.
classif.lda
lda
Linear Discriminant Analysis MASS X X multiclass
prob
twoclass
Learner param 'predict.method' maps to 'method' in predict.lda.
classif.LiblineaRBinary
liblinearbinary
Regularized Binary Linear Predictive Models Estimation LiblineaR X twoclass This model subsumes the types 1,2,3,5.
classif.LiblineaRLogReg
reglreg
Regularized Logistic Regression LiblineaR X prob
twoclass
This model subsumes type 0,6,7.
classif.LiblineaRMultiClass
mcsvc
Multi-class Support Vector Classification by Crammer and Singer LiblineaR X multiclass
twoclass
This model is type 4.
classif.linDA
linda
Linear Discriminant Analysis DiscriMiner X multiclass
twoclass
classif.logreg
logreg
Logistic Regression stats X X X prob
twoclass
Delegates to glm with family binomial/logit.
classif.lqa
lqa
Fitting penalized Generalized Linear Models with the LQA algorithm lqa X X prob
twoclass
penalty has been set to lasso and lambda to 0.1 by default.
classif.lssvm
lssvm
Least Squares Support Vector Machine kernlab X X multiclass
twoclass
fitted has been set to FALSE by default for speed.
classif.lvq1
lvq1
Learning Vector Quantization class X multiclass
twoclass
classif.mda
mda
Mixture Discriminant Analysis mda X X multiclass
prob
twoclass
keep.fitted has been set to FALSE by default for speed and we use start.method='lvq' for more robust behavior / less technical crashes
classif.multinom
multinom
Multinomial Regression nnet X X X multiclass
prob
twoclass
classif.naiveBayes
nbayes
Naive Bayes e1071 X X X multiclass
prob
twoclass
classif.nnet
nnet
Neural Network nnet X X X multiclass
prob
twoclass
size has been set to 3 by default.
classif.nodeHarvest
nodeHarvest
Node Harvest nodeHarvest X X prob
twoclass
classif.OneR
oner
1-R Classifier RWeka X X X multiclass
prob
twoclass
NAs are directly passed to WEKA with na.action = na.pass
classif.pamr
pamr
Nearest shrunken centroid pamr X prob
twoclass
threshold for prediction (threshold.predict) has been set to 1 by default
classif.PART
part
PART Decision Lists RWeka X X X multiclass
prob
twoclass
NAs are directly passed to WEKA with na.action = na.pass
classif.plr
plr
Logistic Regression with a L2 Penalty stepPlr X X X prob
twoclass
AIC and BIC penalty types can be selected via the new parameter cp.type
classif.plsdaCaret
plsdacaret
Partial Least Squares (PLS) Discriminant Analysis caret X prob
twoclass
classif.probit
probit
Probit Regression stats X X X prob
twoclass
Delegates to glm with family binomial/probit.
classif.qda
qda
Quadratic Discriminant Analysis MASS X X multiclass
prob
twoclass
Learner param 'predict.method' maps to 'method' in predict.lda.
classif.quaDA
quada
Quadratic Discriminant Analysis DiscriMiner X multiclass
twoclass
classif.randomForest
rf
Random Forest randomForest X X multiclass
ordered
prob
twoclass
classif.randomForestSRC
rfsrc
Random Forest randomForestSRC X X X multiclass
prob
twoclass
'na.action' has been set to 'na.impute' by default to allow missing data support
classif.rda
rda
Regularized Discriminant Analysis klaR X X multiclass
prob
twoclass
estimate.error has been set to FALSE by default for speed.
classif.rFerns
rFerns
Random ferns rFerns X X multiclass
ordered
twoclass
classif.rpart
rpart
Decision Tree rpart X X X X multiclass
ordered
prob
twoclass
xval has been set to 0 by default for speed.
classif.rrlda
rrlda
Robust Regularized Linear Discriminant Analysis rrlda X multiclass
twoclass
classif.sda
sda
Shrinkage Discriminant Analysis sda X multiclass
prob
twoclass
classif.sparseLDA
sparseLDA
Sparse Discriminant Analysis sparseLDA
MASS
elasticnet
X multiclass
prob
twoclass
Arguments Q and stop are not yet provided as they depend on the task.
classif.svm
svm
Support Vector Machines (libsvm) e1071 X X multiclass
prob
twoclass
classif.xyf
xyf
X-Y fused self-organising maps kohonen X multiclass
prob
twoclass

Regression (45)

ID / Short Name Name Packages Num. Fac. NAs Weights Props Note
regr.bartMachine
bartmachine
Bayesian Additive Regression Trees bartMachine X X X 'use_missing_data' has been set to TRUE by default to allow missing data support
regr.bcart
bcart
Bayesian CART tgp X X se
regr.bdk
bdk
Bi-Directional Kohonen map kohonen X
regr.bgp
bgp
Bayesian Gaussian Process tgp X se
regr.bgpllm
bgpllm
Bayesian Gaussian Process with jumps to the Limiting Linear Model tgp X se
regr.blackboost
blackbst
Gradient Boosting with Regression Trees mboost
party
X X X X see ?ctree_control for possible breakage for nominal features with missingness
regr.blm
blm
Bayesian Linear Model tgp X se
regr.brnn
brnn
Bayesian regularization for feed-forward neural networks brnn X X
regr.bst
bst
Gradient Boosting bst X The argument learner has been renamed to Learner due to a name conflict with setHyerPars
regr.btgp
btgp
Bayesian Treed Gaussian Process tgp X X se
regr.btgpllm
btgpllm
Bayesian Treed Gaussian Process with jumps to the Limiting Linear Model tgp X X se
regr.btlm
btlm
Bayesian Treed Linear Model tgp X X se
regr.cforest
cforest
Random Forest Based on Conditional Inference Trees party X X X X ordered see ?ctree_control for possible breakage for nominal features with missingness
regr.crs
crs
Regression Splines crs X X X se
regr.ctree
ctree
Conditional Inference Trees party X X X X ordered see ?ctree_control for possible breakage for nominal features with missingness
regr.cubist
cubist
Cubist Cubist X X X
regr.earth
earth
Multivariate Adaptive Regression Splines earth X X
regr.elmNN
elmNN
Extreme Learning Machine for Single Hidden Layer Feedforward Neural Networks elmNN X nhid has been set to 1 and actfun has been set to "sig" by default
regr.extraTrees
extraTrees
Extremely Randomized Trees extraTrees X X
regr.fnn
fnn
Fast k-Nearest Neighbor FNN X
regr.frbs
frbs
Fuzzy Rule-based Systems frbs X
regr.gbm
gbm
Gradient Boosting Machine gbm X X X X distribution has been set to gaussian by default.
regr.glmnet
glmnet
GLM with Lasso or Elasticnet Regularization glmnet X X X ordered Factors automatically get converted to dummy columns, ordered factors to integer
regr.IBk
ibk
K-Nearest Neighbours RWeka X X
regr.kknn
kknn
K-Nearest-Neighbor regression kknn X X
regr.km
km
Kriging DiceKriging X se In predict, we currently always use type = 'SK'. The extra param 'jitter' (default is FALSE) enables adding a very small jitter (order 1e-12) to the x-values before prediction, as predict.km reproduces the exact y-values of the training data points, when you pass them in, even if the nugget effect is turned on.
regr.ksvm
ksvm
Support Vector Machines kernlab X X Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed.
regr.laGP
laGP
Local Approximate Gaussian Process laGP X se
regr.lm
lm
Simple Linear Regression stats X X X se
regr.mars
mars
Multivariate Adaptive Regression Splines mda X
regr.mob
mob
Model-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Node party X X X
regr.nnet
nnet
Neural Network nnet X X X size has been set to 3 by default.
regr.nodeHarvest
nodeHarvest
Node Harvest nodeHarvest X X
regr.pcr
pcr
Principal Component Regression pls X X model has been set to FALSE by default for speed.
regr.penalized.lasso
lasso
Lasso Regression penalized X X
regr.penalized.ridge
ridge
Penalized Ridge Regression penalized X X
regr.plsr
plsr
Partial Least Squares Regression pls X X
regr.randomForest
rf
Random Forest randomForest X X ordered
se
regr.randomForestSRC
rfsrc
Random Forest randomForestSRC X X X na.action' has been set to 'na.impute' by default to allow missing data support
regr.rpart
rpart
Decision Tree rpart X X X X ordered xval has been set to 0 by default for speed.
regr.rsm
rsm
Response Surface Regression rsm X You select the order of the regression by using modelfun = "FO" (first order), "TWI" (two-way interactions, this is with 1st oder terms!) and "SO" (full second order)
regr.rvm
rvm
Relevance Vector Machine kernlab X X Kernel parameters have to be passed directly and not by using the kpar list in rvm. Note that fit has been set to FALSE by default for speed.
regr.slim
slim
Sparse Linear Regression using Nonsmooth Loss Functions and L1 Regularization flare X lambda.idx has been set to 3 by default
regr.svm
svm
Support Vector Machines (libsvm) e1071 X X
regr.xyf
xyf
X-Y fused self-organising maps kohonen X

Survival analysis (10)

ID / Short Name Name Packages Num. Fac. NAs Weights Props Note
surv.cforest
crf
Random Forest based on Conditional Inference Trees party
survival
X X X X ordered
rcens
see ?ctree_control for possible breakage for nominal features with missingness
surv.CoxBoost
coxboost
Cox Proportional Hazards Model with Componentwise Likelihood based Boosting CoxBoost X X X ordered
rcens
Factors automatically get converted to dummy columns, ordered factors to integer
surv.coxph
coxph
Cox Proportional Hazard Model survival X X X X prob
rcens
surv.cvglmnet
cvglmnet
GLM with Regularization (Cross Validated Lambda) glmnet X X X ordered
rcens
Factors automatically get converted to dummy columns, ordered factors to integer
surv.glmboost
glmboost
Gradient Boosting with Componentwise Linear Models survival
mboost
X X X ordered
rcens
family has been set to CoxPH() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'.
surv.glmnet
glmnet
GLM with Regularization glmnet X X X ordered
rcens
Factors automatically get converted to dummy columns, ordered factors to integer
surv.optimCoxBoostPenalty
optimCoxBoostPenalty
Cox Proportional Hazards Model with Componentwise Likelihood based Boosting, automatic tuning enabled CoxBoost X X X rcens Factors automatically get converted to dummy columns, ordered factors to integer
surv.penalized
penalized
Penalized Regression penalized X X ordered
rcens
Factors automatically get converted to dummy columns, ordered factors to integer
surv.randomForestSRC
rfsrc
Random Forests for Survival survival
randomForestSRC
X X X ordered
rcens
'na.action' has been set to 'na.impute' by default to allow missing data support
surv.rpart
rpart
Survival Tree rpart X X X X ordered
rcens
xval has been set to 0 by default for speed.

Cluster analysis (6)

ID / Short Name Name Packages Num. Fac. NAs Weights Props Note
cluster.cmeans
cmeans
Fuzzy C-Means Clustering e1071
clue
X prob The 'predict' method uses 'cl_predict' from the 'clue' package to compute the cluster memberships for new data. The default 'centers=2' is added so the method runs without setting params, but this must in reality of course be changed by the user.
cluster.EM
em
Expectation-Maximization Clustering RWeka X
cluster.FarthestFirst
farthestfirst
FarthestFirst Clustering Algorithm RWeka X
cluster.kmeans
kmeans
K-Means stats
clue
X The 'predict' method uses 'cl_predict' from the 'clue' package to compute the cluster memberships for new data. The default 'centers=2' is added so the method runs without setting params, but this must in reality of course be changed by the user.
cluster.SimpleKMeans
simplekmeans
K-Means Clustering RWeka X
cluster.XMeans
xmeans
XMeans (k-means with automatic determination of k) RWeka X You may have to install the XMeans Weka package: WPM('install-package', 'XMeans').