Most of the computed descriptors differ in the scales in which their values lie. One may need to normalize them before proceeding with further statistical analysis. This mostly depends on the subsequent Machine Learning algorithms that one wants to run on the data.
Algorithms like Decision Trees, Regression Forest, Decision Forest and Naïve Bayes do not require normalized data as input. For Linear Regression, normalization is a recommended step. For Neural Networks – classification or regression, Support Vector Machines – classification or regression, normalization of data is required.
In context of cheminformatics, a standard way to normalize data is by mean shifting and auto-scaling. This makes the mean of a thus transformed descriptor column as 0 and the standard deviation as 1.
mean shifting , autoscaling , standard score
Cite This As:
Dogra, Shaillay K., "Normalization." From QSARWorld--A Strand Life Sciences Web Resource.