If the number of descriptors is very large in comparison to the number of compounds, a learning algorithm is faced with the problem of selecting a relevant subset of features (or descriptors). One of the ways to select features that are most relevant to the property of interest is by using ‘Backward Elimination’.
In Backward Elimination, we start with the full feature-subset, i.e., the feature-subset at the onset has all the N descriptors in it. Now, each of the descriptor is dropped one by one, and N models are learnt containing N-1 descriptor each. This requires pre-setting a learning algorithm (and its associated parameters for model training). We obtain N model statistics at this point and the model that performs the best is now chosen. Since each of these models has N-1 descriptors in them, by choosing the best model, we have selected the best feature-subset with N-1 descriptors and thus, in effect, eliminated the worst descriptor (out of N) for modeling the given property.
The feature-subset now contains N-1 descriptors (as chosen in the previous step). Next, N-1 feature-subsets containing N-2 descriptors each are made by dropping all the descriptors one by one. Again, N-1 models are learnt and their statistics compared to select the best performing model. As earlier, when we select the best model, in effect we eliminate the worst performing descriptor from the feature-subset.
These iterations are further continued till either a pre-specified target size (desired number of descriptors) is reached or the desired performance statistics (classification accuracy or regression fit) is obtained.
A point to be kept in mind here is that it may not be possible to use Backward Elimination under all conditions. This depends upon the learning algorithm being used and the starting number of descriptors, N, from which the eliminations need to be made. For example, if number of descriptors is greater than the number of compounds an algorithm like Multivariate Linear Regression (MLR) cannot operate and thus Backward Elimination cannot be performed using MLR. Similarly, with too high a starting N, algorithms like Neural Network (NN) or Support Vector Machines (SVM) may prove computationally expensive. Again, in such a scenario, it may not be feasible to run Backward Elimination using NN or SVM. (Forward Selection can be used in such cases. However, it should be noted that the feature-space sampled by the two methods – Forward Selection and Backward Elimination – differs.)
Feature Selection, Forward Selection, Genetic Algorithms
R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97 (1-2), 273-324, 1996
Cite This As:
Dogra, Shaillay K., "Backward Elimination" From QSARWorld--A Strand Life Sciences Web Resource.