QSAR WORLD
Home | About QSAR World | Strand Life Sciences | Contact Us
Google Custom Search

Forward Selection

If the number of descriptors is very large in comparison to the number of compounds, a learning algorithm is faced with the problem of selecting a relevant subset of features (or descriptors). One of the ways to select features that are most relevant to the property of interest is by using ‘Forward Selection’.

In Forward Selection, we start with a null feature-subset, i.e., the feature-subset at the onset has 0 selected features. Now, for N descriptors (from which we have to select the relevant subset), N models are learnt containing 1 descriptor each. This requires pre-setting a learning algorithm (and its associated parameters for model training). We obtain N model statistics at this point and the model that performs the best is chosen. Since each of these models had 1 descriptor in them, we have in effect chosen the best descriptor (out of N) for modeling the given property.

The feature-subset now contains 1 descriptor (as chosen in the previous step). Next, N-1 feature-subsets are made by pairing this chosen descriptor with all the remaining N-1 descriptors, one by one. Again, N-1 models are learnt and their statistics compared to select the best performing model. As earlier, when we select the best model, in effect we choose the best performing pair of descriptors. However, it should be noted that not all possible pairs of descriptors are being evaluated here. Given N descriptors, this would imply N * N-1 exhaustive pairs of descriptors. Since we have already chosen the first descriptor, and are only forming and evaluating its pairs with the remaining N-1 descriptors, we are effectively only testing N-1 pairs (evaluating N-1 hypothesis or learning N-1 models).

In the next step, feature-subsets containing 3 descriptors are made, by adding the remaining N-2 features one by one to the previously selected pair. As earlier, the best performing model (containing 3 descriptors at this stage) is chosen to take forward to the next step. These iterations are further continued till either a pre-specified target size (desired number of descriptors) is reached or the desired performance statistics (classification accuracy or regression fit) is obtained.

See Also:

Feature Selection, Backward Elimination, Genetic Algorithms


References:

R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97 (1-2), 273-324, 1996.


Cite This As:

Dogra, Shaillay K., "Forward Selection" From QSARWorld--A Strand Life Sciences Web Resource.
http://www.qsarworld.com/qsar-ml-forward-selection.php

Have any Questions?
Name:
Email:
Enter your query/comment here
 

    Facilitated by
    Strand Life Sciences Pvt. LtdStrandls Logo