If the number of descriptors is very large in comparison to the number of compounds, a learning algorithm is faced with the problem of selecting a relevant subset of features (or descriptors). ‘Feature selection’ is the process of choosing a subset of original features so that the feature space is optimally reduced according to some evaluation criterion.
Feature selection algorithms broadly fall into the ‘filter’ model and the ‘wrapper’ model. The ‘filter’ model relies on general characteristics of the training data to select some features without involving any learning algorithm; therefore, it does not inherit any bias of a learning algorithm. When the number of features becomes very large, the ‘filter’ model is usually a choice due to its computational efficiency. The ‘wrapper’ model requires a pre-determined learning algorithm in feature selection and uses its performance to evaluate and determine which features get selected. The ‘wrapper’ model tends to give superior performance as it finds features better suited to the pre-determined learning algorithm, but it also tends to be computationally more expensive.
Forward Selection, Backward Elimination, Genetic Algorithms
Mark A. Hall, Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning, Proceedings of the Seventeenth International Conference on Machine Learning, pp.359-366,
June 29-July 02, 2000.
Blum, A., and Langley, P. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, 245-271, 1997.
R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97 (1-2), 273-324, 1996.
Cite This As:
Gupta, N., "Feature Selection" From QSARWorld--A Strand Life Sciences Web Resource.