QSAR WORLD
Home | About QSAR World | Strand Life Sciences | Contact Us
Google Custom Search

Accuracy of Prediction

Kubinyi has another simple explanation of the prediction paradox [1]. Even in the absence of real outliers, external prediction will be worse than fit because the model tries to 'fit the errors' and attempts to explain them. Accordingly, external predictions contain the model error and the experimental error. When variable selection is carried out, no independent variable selection is performed in the cross-validation runs; correspondingly, variables that were included to 'explain the error' remain in the model and cause wrong predictions [1]. The higher the number of descriptors relative to the number of compounds, the higher is the chance to select those of them that give high q2 values [8]. Other reasons for overestimating q2 are redundancy in the training set, or, in the case of non-linear methods, the existence of multiple minima [8].

Arthur Doweyko has also published on the elusive nature of 3D QSAR predictions [10], but concludes: "Predictions can be enhanced when the test set is bounded by the descriptor space represented in the training set. Interpretation of significant interaction regions becomes more meaningful when alignment is constrained by a binding site."

At a workshop held in Setubal, Portugal in 2002, a set of principles was proposed to define the validity and applicability domain of QSAR models. These then evolved into the OECD principles in 2004 [11]. Paola Gramatica discusses three of these principles in a recent publication [12], and in particular, emphasizes the need for external validation using at least 20% of the data. Gramatica, Tropsha and others believe that validation is the absolute essential for successful application and interpretation of QSAR models [3, 13].

The necessity for validation has been accepted by leading journals. The policy of J. Chem. Inf. Model. on QSAR manuscripts [14] has been adopted by other journals such as J. Med. Chem. [15] and ChemMedChem. In part, it states: "If a new method/theory is being reported in the paper, it should be compared and 'validated' against at least one other common data set for which a published study exists, using at least one other method/approach and preferably a method/approach that has been widely used in the field. The data set should not be small... Evidence that any reported QSAR/QSPR model has been properly validated using data not in the training set must be provided."

Page 1 | 2 | 3 | 4
Have any Questions?
Name:
Email:
Enter your query/comment here
 

    Facilitated by
    Strand Life Sciences Pvt. LtdStrandls Logo