Choice Between Models
Say, various models have been learnt for a given dataset, possibly, using different algorithms or different parameters that were varied for the same algorithm. If one has to select a model among these, which model to choose?
This question can be answered on the basis of a criterion that can be called as ‘model complexity’. Model complexity can be defined vaguely in terms of algorithm complexity, number of descriptors (used in the model), computation time (required for training), model interpretability, etc.
The assumption here is that on the given dataset, on which these models have been learnt, they are giving similar performance, if not the same. It is obvious that if any of these models outperforms others by a fair margin then this question is kind of invalid and one should in all possibility select the top-performer. Also, if some model uses lesser number of descriptors or is easier to interpret then that model should be the one of choice.
Say, for a given classification dataset we have learnt a Decision-Tree (DT), a Decision-Forest (DF), a Naive-Bayes (NB), a Support-Vector-Machines (SVM), and a Neural-Network (NN) model. If the performances of these different models are similar, which single model to now choose is the question that needs to be decided.
On the basis of ‘algorithm complexity’, the recommended sequence of selection for the above mentioned classification scenario could be: DT < DF < NB < SVM < NN (in the increasing order of complexity).
For Regression models, some of the available algorithms are - Multivariate Linear Regression (MLR), Support-Vector-Machines Regression (SVMR), Neural-Network Regression (NNR), Regression Forest (RF), and Partial Least Squares (PLS). Here, a recommended sequence of selection could be - MLR < PLS < RF < SVMR < NNR (in the increasing order of complexity).
Performance on ‘test set’ or ‘hold out set’ (that has not been used during modeling) can be another basis for making this selection. If any of the given models performs better on such a set, possibly that is the one that should make the cut.
See Also:
Model Applicability
References:
Cite This As:
Dogra, Shaillay K., "Choice Between Models" From QSARWorld--A Strand Life Sciences Web Resource.
http://www.qsarworld.com/qsar-ml-choice-between-models.php |