Integrated Learners
Here, the learning methods already integrated into mlr are listed.
Columns Num., Fac., NAs, and Weights indicate if a method can cope with
numerical and factor predictors, if NA
s in the data are allowed and if observation
weights are supported, respectively.
Column Props shows further properties of the learning methods. ordered indicates that a method can deal with ordered factor features. For classification, you can see if binary and/or multi-class problems are supported. For survival analysis, the censoring type is shown. For example rcens means that the learning method can deal with right censored data. Moreover, the type of prediction is displayed, where prob indicates that probabilities can be predicted. For regression, se means that standard errors and the mean response can be predicted.
Classification (54)
ID / Short Name | Name | Packages | Num. | Fac. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
classif.ada ada |
ada Boosting | ada | X | X | X | prob twoclass |
||
classif.bartMachine bartmachine |
Bayesian Additive Regression Trees | bartMachine | X | X | X | prob twoclass |
'use_missing_data' has been set to TRUE by default to allow missing data support | |
classif.bdk bdk |
Bi-Directional Kohonen map | kohonen | X | multiclass prob twoclass |
||||
classif.binomial binomial |
Binomial Regression | stats | X | X | X | prob twoclass |
Delegates to glm with freely choosable binomial link function via learner param 'link'. | |
classif.blackboost blackbst |
Gradient Boosting With Regression Trees | mboost party |
X | X | X | X | prob twoclass |
see ?ctree_control for possible breakage for nominal features with missingness |
classif.boosting adabag |
Adabag Boosting | adabag rpart |
X | X | X | multiclass prob twoclass |
xval has been set to 0 by default for speed. |
|
classif.bst bst |
Gradient Boosting | bst | X | twoclass | The argument learner has been renamed to Learner due to a name conflict with setHyerPars . Learner has been set to lm by default. |
|||
classif.cforest cforest |
Random forest based on conditional inference trees | party | X | X | X | X | multiclass ordered prob twoclass |
see ?ctree_control for possible breakage for nominal features with missingness |
classif.ctree ctree |
Conditional Inference Trees | party | X | X | X | X | multiclass ordered prob twoclass |
see ?ctree_control for possible breakage for nominal features with missingness |
classif.extraTrees extraTrees |
Extremely Randomized Trees | extraTrees | X | X | multiclass prob twoclass |
|||
classif.fnn fnn |
Fast k-Nearest Neighbour | FNN | X | multiclass twoclass |
||||
classif.gbm gbm |
Gradient Boosting Machine | gbm | X | X | X | X | multiclass prob twoclass |
|
classif.geoDA geoda |
Geometric Predictive Discriminant Analysis | DiscriMiner | X | multiclass twoclass |
||||
classif.glmboost glmbst |
Boosting for GLMs | mboost | X | X | X | prob twoclass |
family has been set to Binomial() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'. |
|
classif.glmnet glmnet |
GLM with Lasso or Elasticnet Regularization | glmnet | X | X | X | multiclass prob twoclass |
Factors automatically get converted to dummy columns, ordered factors to integer | |
classif.hdrda hdrda |
High-Dimensional Regularized Discriminant Analysis | sparsediscrim | X | prob twoclass |
||||
classif.IBk ibk |
k-Nearest Neighbours | RWeka | X | X | multiclass prob twoclass |
|||
classif.J48 j48 |
J48 Decision Trees | RWeka | X | X | X | multiclass prob twoclass |
NAs are directly passed to WEKA with na.action = na.pass |
|
classif.JRip jrip |
Propositional Rule Learner | RWeka | X | X | X | multiclass prob twoclass |
NAs are directly passed to WEKA with na.action = na.pass |
|
classif.kknn kknn |
k-Nearest Neighbor | kknn | X | X | multiclass prob twoclass |
|||
classif.knn knn |
k-Nearest Neighbor | class | X | multiclass twoclass |
||||
classif.ksvm ksvm |
Support Vector Machines | kernlab | X | X | multiclass prob twoclass |
Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed. |
||
classif.lda lda |
Linear Discriminant Analysis | MASS | X | X | multiclass prob twoclass |
Learner param 'predict.method' maps to 'method' in predict.lda. | ||
classif.LiblineaRBinary liblinearbinary |
Regularized Binary Linear Predictive Models Estimation | LiblineaR | X | twoclass | This model subsumes the types 1,2,3,5. | |||
classif.LiblineaRLogReg reglreg |
Regularized Logistic Regression | LiblineaR | X | prob twoclass |
This model subsumes type 0,6,7. | |||
classif.LiblineaRMultiClass mcsvc |
Multi-class Support Vector Classification by Crammer and Singer | LiblineaR | X | multiclass twoclass |
This model is type 4. | |||
classif.linDA linda |
Linear Discriminant Analysis | DiscriMiner | X | multiclass twoclass |
||||
classif.logreg logreg |
Logistic Regression | stats | X | X | X | prob twoclass |
Delegates to glm with family binomial/logit. | |
classif.lqa lqa |
Fitting penalized Generalized Linear Models with the LQA algorithm | lqa | X | X | prob twoclass |
penalty has been set to lasso and lambda to 0.1 by default. |
||
classif.lssvm lssvm |
Least Squares Support Vector Machine | kernlab | X | X | multiclass twoclass |
fitted has been set to FALSE by default for speed. |
||
classif.lvq1 lvq1 |
Learning Vector Quantization | class | X | multiclass twoclass |
||||
classif.mda mda |
Mixture Discriminant Analysis | mda | X | X | multiclass prob twoclass |
keep.fitted has been set to FALSE by default for speed and we use start.method='lvq' for more robust behavior / less technical crashes |
||
classif.multinom multinom |
Multinomial Regression | nnet | X | X | X | multiclass prob twoclass |
||
classif.naiveBayes nbayes |
Naive Bayes | e1071 | X | X | X | multiclass prob twoclass |
||
classif.nnet nnet |
Neural Network | nnet | X | X | X | multiclass prob twoclass |
size has been set to 3 by default. |
|
classif.nodeHarvest nodeHarvest |
Node Harvest | nodeHarvest | X | X | prob twoclass |
|||
classif.OneR oner |
1-R Classifier | RWeka | X | X | X | multiclass prob twoclass |
NAs are directly passed to WEKA with na.action = na.pass |
|
classif.pamr pamr |
Nearest shrunken centroid | pamr | X | prob twoclass |
threshold for prediction (threshold.predict ) has been set to 1 by default |
|||
classif.PART part |
PART Decision Lists | RWeka | X | X | X | multiclass prob twoclass |
NAs are directly passed to WEKA with na.action = na.pass |
|
classif.plr plr |
Logistic Regression with a L2 Penalty | stepPlr | X | X | X | prob twoclass |
AIC and BIC penalty types can be selected via the new parameter cp.type |
|
classif.plsdaCaret plsdacaret |
Partial Least Squares (PLS) Discriminant Analysis | caret | X | prob twoclass |
||||
classif.probit probit |
Probit Regression | stats | X | X | X | prob twoclass |
Delegates to glm with family binomial/probit. | |
classif.qda qda |
Quadratic Discriminant Analysis | MASS | X | X | multiclass prob twoclass |
Learner param 'predict.method' maps to 'method' in predict.lda. | ||
classif.quaDA quada |
Quadratic Discriminant Analysis | DiscriMiner | X | multiclass twoclass |
||||
classif.randomForest rf |
Random Forest | randomForest | X | X | multiclass ordered prob twoclass |
|||
classif.randomForestSRC rfsrc |
Random Forest | randomForestSRC | X | X | X | multiclass prob twoclass |
'na.action' has been set to 'na.impute' by default to allow missing data support | |
classif.rda rda |
Regularized Discriminant Analysis | klaR | X | X | multiclass prob twoclass |
estimate.error has been set to FALSE by default for speed. |
||
classif.rFerns rFerns |
Random ferns | rFerns | X | X | multiclass ordered twoclass |
|||
classif.rpart rpart |
Decision Tree | rpart | X | X | X | X | multiclass ordered prob twoclass |
xval has been set to 0 by default for speed. |
classif.rrlda rrlda |
Robust Regularized Linear Discriminant Analysis | rrlda | X | multiclass twoclass |
||||
classif.sda sda |
Shrinkage Discriminant Analysis | sda | X | multiclass prob twoclass |
||||
classif.sparseLDA sparseLDA |
Sparse Discriminant Analysis | sparseLDA MASS elasticnet |
X | multiclass prob twoclass |
Arguments Q and stop are not yet provided as they depend on the task. | |||
classif.svm svm |
Support Vector Machines (libsvm) | e1071 | X | X | multiclass prob twoclass |
|||
classif.xyf xyf |
X-Y fused self-organising maps | kohonen | X | multiclass prob twoclass |
Regression (45)
ID / Short Name | Name | Packages | Num. | Fac. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
regr.bartMachine bartmachine |
Bayesian Additive Regression Trees | bartMachine | X | X | X | 'use_missing_data' has been set to TRUE by default to allow missing data support | ||
regr.bcart bcart |
Bayesian CART | tgp | X | X | se | |||
regr.bdk bdk |
Bi-Directional Kohonen map | kohonen | X | |||||
regr.bgp bgp |
Bayesian Gaussian Process | tgp | X | se | ||||
regr.bgpllm bgpllm |
Bayesian Gaussian Process with jumps to the Limiting Linear Model | tgp | X | se | ||||
regr.blackboost blackbst |
Gradient Boosting with Regression Trees | mboost party |
X | X | X | X | see ?ctree_control for possible breakage for nominal features with missingness | |
regr.blm blm |
Bayesian Linear Model | tgp | X | se | ||||
regr.brnn brnn |
Bayesian regularization for feed-forward neural networks | brnn | X | X | ||||
regr.bst bst |
Gradient Boosting | bst | X | The argument learner has been renamed to Learner due to a name conflict with setHyerPars |
||||
regr.btgp btgp |
Bayesian Treed Gaussian Process | tgp | X | X | se | |||
regr.btgpllm btgpllm |
Bayesian Treed Gaussian Process with jumps to the Limiting Linear Model | tgp | X | X | se | |||
regr.btlm btlm |
Bayesian Treed Linear Model | tgp | X | X | se | |||
regr.cforest cforest |
Random Forest Based on Conditional Inference Trees | party | X | X | X | X | ordered | see ?ctree_control for possible breakage for nominal features with missingness |
regr.crs crs |
Regression Splines | crs | X | X | X | se | ||
regr.ctree ctree |
Conditional Inference Trees | party | X | X | X | X | ordered | see ?ctree_control for possible breakage for nominal features with missingness |
regr.cubist cubist |
Cubist | Cubist | X | X | X | |||
regr.earth earth |
Multivariate Adaptive Regression Splines | earth | X | X | ||||
regr.elmNN elmNN |
Extreme Learning Machine for Single Hidden Layer Feedforward Neural Networks | elmNN | X | nhid has been set to 1 and actfun has been set to "sig" by default | ||||
regr.extraTrees extraTrees |
Extremely Randomized Trees | extraTrees | X | X | ||||
regr.fnn fnn |
Fast k-Nearest Neighbor | FNN | X | |||||
regr.frbs frbs |
Fuzzy Rule-based Systems | frbs | X | |||||
regr.gbm gbm |
Gradient Boosting Machine | gbm | X | X | X | X | distribution has been set to gaussian by default. |
|
regr.glmnet glmnet |
GLM with Lasso or Elasticnet Regularization | glmnet | X | X | X | ordered | Factors automatically get converted to dummy columns, ordered factors to integer | |
regr.IBk ibk |
K-Nearest Neighbours | RWeka | X | X | ||||
regr.kknn kknn |
K-Nearest-Neighbor regression | kknn | X | X | ||||
regr.km km |
Kriging | DiceKriging | X | se | In predict, we currently always use type = 'SK'. The extra param 'jitter' (default is FALSE) enables adding a very small jitter (order 1e-12) to the x-values before prediction, as predict.km reproduces the exact y-values of the training data points, when you pass them in, even if the nugget effect is turned on. | |||
regr.ksvm ksvm |
Support Vector Machines | kernlab | X | X | Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed. |
|||
regr.laGP laGP |
Local Approximate Gaussian Process | laGP | X | se | ||||
regr.lm lm |
Simple Linear Regression | stats | X | X | X | se | ||
regr.mars mars |
Multivariate Adaptive Regression Splines | mda | X | |||||
regr.mob mob |
Model-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Node | party | X | X | X | |||
regr.nnet nnet |
Neural Network | nnet | X | X | X | size has been set to 3 by default. |
||
regr.nodeHarvest nodeHarvest |
Node Harvest | nodeHarvest | X | X | ||||
regr.pcr pcr |
Principal Component Regression | pls | X | X | model has been set to FALSE by default for speed. |
|||
regr.penalized.lasso lasso |
Lasso Regression | penalized | X | X | ||||
regr.penalized.ridge ridge |
Penalized Ridge Regression | penalized | X | X | ||||
regr.plsr plsr |
Partial Least Squares Regression | pls | X | X | ||||
regr.randomForest rf |
Random Forest | randomForest | X | X | ordered se |
|||
regr.randomForestSRC rfsrc |
Random Forest | randomForestSRC | X | X | X | na.action' has been set to 'na.impute' by default to allow missing data support | ||
regr.rpart rpart |
Decision Tree | rpart | X | X | X | X | ordered | xval has been set to 0 by default for speed. |
regr.rsm rsm |
Response Surface Regression | rsm | X | You select the order of the regression by using modelfun = "FO" (first order), "TWI" (two-way interactions, this is with 1st oder terms!) and "SO" (full second order) | ||||
regr.rvm rvm |
Relevance Vector Machine | kernlab | X | X | Kernel parameters have to be passed directly and not by using the kpar list in rvm. Note that fit has been set to FALSE by default for speed. |
|||
regr.slim slim |
Sparse Linear Regression using Nonsmooth Loss Functions and L1 Regularization | flare | X | lambda.idx has been set to 3 by default | ||||
regr.svm svm |
Support Vector Machines (libsvm) | e1071 | X | X | ||||
regr.xyf xyf |
X-Y fused self-organising maps | kohonen | X |
Survival analysis (10)
ID / Short Name | Name | Packages | Num. | Fac. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
surv.cforest crf |
Random Forest based on Conditional Inference Trees | party survival |
X | X | X | X | ordered rcens |
see ?ctree_control for possible breakage for nominal features with missingness |
surv.CoxBoost coxboost |
Cox Proportional Hazards Model with Componentwise Likelihood based Boosting | CoxBoost | X | X | X | ordered rcens |
Factors automatically get converted to dummy columns, ordered factors to integer | |
surv.coxph coxph |
Cox Proportional Hazard Model | survival | X | X | X | X | prob rcens |
|
surv.cvglmnet cvglmnet |
GLM with Regularization (Cross Validated Lambda) | glmnet | X | X | X | ordered rcens |
Factors automatically get converted to dummy columns, ordered factors to integer | |
surv.glmboost glmboost |
Gradient Boosting with Componentwise Linear Models | survival mboost |
X | X | X | ordered rcens |
family has been set to CoxPH() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'. |
|
surv.glmnet glmnet |
GLM with Regularization | glmnet | X | X | X | ordered rcens |
Factors automatically get converted to dummy columns, ordered factors to integer | |
surv.optimCoxBoostPenalty optimCoxBoostPenalty |
Cox Proportional Hazards Model with Componentwise Likelihood based Boosting, automatic tuning enabled | CoxBoost | X | X | X | rcens | Factors automatically get converted to dummy columns, ordered factors to integer | |
surv.penalized penalized |
Penalized Regression | penalized | X | X | ordered rcens |
Factors automatically get converted to dummy columns, ordered factors to integer | ||
surv.randomForestSRC rfsrc |
Random Forests for Survival | survival randomForestSRC |
X | X | X | ordered rcens |
'na.action' has been set to 'na.impute' by default to allow missing data support | |
surv.rpart rpart |
Survival Tree | rpart | X | X | X | X | ordered rcens |
xval has been set to 0 by default for speed. |
Cluster analysis (6)
ID / Short Name | Name | Packages | Num. | Fac. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
cluster.cmeans cmeans |
Fuzzy C-Means Clustering | e1071 clue |
X | prob | The 'predict' method uses 'cl_predict' from the 'clue' package to compute the cluster memberships for new data. The default 'centers=2' is added so the method runs without setting params, but this must in reality of course be changed by the user. | |||
cluster.EM em |
Expectation-Maximization Clustering | RWeka | X | |||||
cluster.FarthestFirst farthestfirst |
FarthestFirst Clustering Algorithm | RWeka | X | |||||
cluster.kmeans kmeans |
K-Means | stats clue |
X | The 'predict' method uses 'cl_predict' from the 'clue' package to compute the cluster memberships for new data. The default 'centers=2' is added so the method runs without setting params, but this must in reality of course be changed by the user. | ||||
cluster.SimpleKMeans simplekmeans |
K-Means Clustering | RWeka | X | |||||
cluster.XMeans xmeans |
XMeans (k-means with automatic determination of k) | RWeka | X | You may have to install the XMeans Weka package: WPM('install-package', 'XMeans'). |