A Primer on Molecular Similarity in QSAR and Virtual Screening
Part II - How reliable are experimental measurements (endpoints) in QSAR studies?
Andreas Bender, PhD, an editorial advisor and columnist of QSAR World, continues with his insights into the art and science of QSAR modeling and virtual screening with the part II of the primer series.
Download PDF Version
Read previous article >>
In the first part of this three-part primer on how to construct "better" QSAR models we have seen that not always the more complex descriptors should be chosen. Indeed, in most cases the simplest descriptors that are still able to describe a certain phenomenon should be chosen to generate a model. Reasons are, among others, the risk of over fitting if too many variables are present; and also interpretation problems of more complex descriptors, rendering rational improvement of a compound difficult.
In this article, we will investigate reliability of the data we are attempting to model. Naively, one might just take the numbers from experimental measurements and apply some kind of statistical or machine learning approach in order to model the data points. In the following paragraphs we will see that in the real world, life is not so easy. Experimental data ("endpoints") depend heavily on the conditions under which they were measured, making the comparison of data as difficult as the creation of models based on data from different sources. We will discuss two different endpoints often used for quantitative modeling, namely solubility and bioactivity. The aim here is not to provide a comprehensive guide to experimental measurements, but to sharpen your eye for problems associated with any kind of experimental data points used for modeling.
2. Endpoint Measurements
While solubility seems to be a trivial property of matter, once the attempt is made to produce reliable (and reproducible!) measurements disillusionment can occur1. Even for drugs on the market, such as diclofenac, published solubilities varied by a factor of around 100 (!)2. Commonly, two different types of measurements are used, called thermodynamic and kinetic solubility measurements. Both names are a little misleading, since "thermodynamic" measurements imply waiting until an equilibrium state between solid and solubilized phase has been achieved, a process, which is, in practice, stopped after a certain time (often 24 hours). "Kinetic" measurements are performed by adding a DMSO solution of the compound to the solvent of interest. While the speed of solution plays a role in these "kinetic" measurements, their nature is actually based on precipitation (instead of solubility) under particular conditions (see below)1.
Which problems do we face when measuring "thermodynamic" solubilities? Firstly, the crystal form of the molecule has a huge influence on solubility, as shown recently for the example mentioned above, diclofenac2. During measurement, crystals may not convert into the most stable form, giving apparent solubilities of a particular salt form instead of the true solubility of the most stable form. Also, different ionization forms of a molecule change solubility quite considerably - the value of solubility is not meaningful without knowing the pKa of an acid or base as well as well as the protonation state and pH at equilibrium. In particular, attention should be paid to regions where the solubility is strongly pH-dependent. Often buffers are used during solubility measurements, either to keep conditions such as pH constant or to mimic certain environments such as the gut. Here the buffer as well as the final concentration of solid and solubilized form need to be documented properly, since otherwise the conditions of the experiment are not completely defined.
Even if attention to all the above points is paid, in practice often not enough crystals might be available for thermodynamic solubility measurements (which is usually in the order of 1mg of compound). In such cases, "kinetic" solubility measurements are often used which are based on the precipitation of compounds. The substance is first dissolved in DMSO as a near-perfect solvent, which is added to the solvent of interest (e.g. water) in which solubility has to be determined. After more and more compound is added, at a certain stage precipitation occurs, where the point of precipitation defines the solubility of the compound. Problems here are that the kinetic solubility is often determined on compounds that have not been purified sufficiently, often (due to standard thermodynamic laws) leading to higher solubilities than that of the pure substance. Also, supersaturation may occur since the compound was previously dissolved in a "perfect" solvent, potentially leading to overoptimistic solubility assumptions.