Expectations of a chemist from a ‘good’ QSAR Model
This issue must be considered in the context of where and how the model is being applied. QSAR models may be used either for hit finding (virtual screening) or more classically for lead optimisation (analogue prediction). QSPR models for ADMET predictions may be used as a filter in library profiling or in lead optimisation. The significance of caveats relating to model quality in both of these scenarios requires some clarification. A chemist is less likely to be troubled by inaccuracies during hit finding than lead optimisation.
To exemplify this, imagine a screening program against a given target wherein a limited library of compounds, from in an in- house database of several million, will be purchased for testing based on model predictions. ADMET predictive models and activity predictions are used to filter the compounds. Millions of predictions are possibly being made at this stage. In this scenario, the hit rate from the QSAR and ADMET models can be relatively low, maybe only 30%, but yield a number of interesting compounds that save considerable resource, compared to HTS testing of all of the compounds, and represent a significant enrichment. In hit finding, the accuracy of the model is competing with random selection. This contrasts to the lead optimisation scenario in which fewer predictions are made and the cost of inaccuracy is much greater as it leads to both unnecessary compound synthesis and longer lead optimisation timeframes.
A QSAR model in this phase must be at least as predictive as a chemist’s intuition. Since many analogues around a hit are likely to show some activity, the objective value of the QSAR model may be hard to determine. The crunch point occurs when the model predicts that a set of compounds, which a chemist would otherwise favour are predicted to be inactive by the model. It is usual for a model without confirmed external predictive capability to be assessed by such compounds in order to gain validation.
ADMET QSPR models have utility in lead optimisation: the decision not to make compounds that are predicted to exhibit properties outside of a desired range is significant, especially when the cost of testing compounds for those properties is lower than synthesis costs.
The role of QSAR in lead optimisation may be greeted with healthy scepticism. A chemist should question if the model adheres to the rules of Topliss & Costello1, and Unger & Hansch2. Unfortunately due to a number of reasons (including the ease of generating such models) some do not! A model should not be totally relied upon until at least one round of external testing has provided validation.
Further meaningful predictions may be expected as long as chemical space is similar to compounds seen in training. Model originators should be aware of and inform the chemist when predictions are being made for structures, which may lie outside the scope of the model generation. Finally, the value of the prediction should be balanced against the cost of compound synthesis and testing.
QSAR remains a very powerful technique likely to play an increasingly important role in compound design and selection. It is a highly efficient method of maximising the value of experimental data; its place in the medicinal chemist’s tool kit looks assured.
References:
1. Topliss JG, Costello RJ (1972) Chance correlations in structure-activity studies using multiple regression analysis, J. Med. Chem. 15:1066-1068
2. Unger SH, Hansch C (1973) On model building in structure-activity relationships. A re-examination of adrenergic blocking activity of beta-halo-beta-arylalkylamines, J. Med. Chem. 16:745-749
3. van Drie JH (2003) Pharmacophore discovery - lessons learned. Curr. Pharm. Des. 9:1649-1664
|