A Primer on Molecular Similarity in QSAR and Virtual Screening
Part I - Descriptor Choice
3. Summary and Conclusions
While a discussion of molecular descriptors for QSAR and virtual screening studies would fill several volumes, I here decided to focus on three distinct, but crucial aspects of how to select molecular descriptors for those tasks. Firstly, a small number of meaningful descriptors not only gives models which are easier to interpret, they are also more likely to be of statistical significance. If one is planning to develop QSAR models, it therefore saves disappointment later to think about which variables to include into your analysis beforehand. Secondly, it has often been claimed that 3D approaches are more able to discover novel scaffolds with similar properties than 2D approaches. While this is certainly true for some of the particular descriptors one can use, it is still open to discussion whether this is an inherent property of the dimensionality of the descriptor or the particular descriptor definition. Since the (global) spatial arrangement of atoms is to a good extent defined by the (local) connectivity information though, it might be possible that no too large an intrinsic bias between 2D and 3D descriptors exists. Finally, while virtual screening and QSAR seem to work in many cases, they are no perfect descriptions of the system we are dealing with, but rather a statistical correlation that we are trying to interpret as causal relationships and make according modifications to structures. Many cases have shown that there is some validity to this approach - but current descriptors still deserve improvements (which are certainly to be delivered by bright current and future graduate students).
(To be followed by
Part II. How reliable are experimental measurements (endpoints) in QSAR studies? and
Part III. Connecting descriptors and experimental measurements - model generation.)
|