QSAR for Decision Support
Most often models built by scientists are
characterized largely by the type and quality of data available.
For example, given permeability values across a membrane, one can
either build a regression model to directly predict the value or build
a classifier that predicts the value-bin a compound falls in (such as
high or low). Conventionally, a model is considered good based on
certain statistical metrics such as R2 or Q2 for regression models, or
precision and recall for classifiers. But in reality the only metric
that model needs to satisfy and indeed, the only one that counts is
whether the model is useful or not. The leads us to the question, how
does one define usefulness?
To do this, let us step back and understand the need for models in the
first place. Models are mathematical representations of
biological/chemical hypotheses that need to be confirmed by
experiments. By creating a QSAR model, we are implicitly asserting the
following.
1) There exists a relationship between structure and activity.
2) This relationship is complex and not easily discernable upon inspection.
3) However, this complex relationship can be captured mathematically.
4) This mathematical relationship will enable the chemist to pick/design the next set of molecules to experiment upon.
What is implicit is that a useful model will allow me to make a
different decision from what I would do in its absence, i.e. the
presence of the model predictions changes my next experimental step.
Indeed, if I make the same decision whether or not I have access to a
model, why do I need it! Once we accept that the primary need of
modeling is its utilization, we are immediately faced with the next
question – what is the decision that my model is going to impact?
In fact, depending upon where one is in the pharma pipeline, one faces
different challenges, different model needs and hence different metrics
for model “goodness”. Is the model being used as a screen?
Do you want to select leads or candidates from a pool of possibilities?
Is the model an aid for design? The answers to these questions lead us
to the interesting and perhaps not so intuitive notion that factors
external to the data-type or quality have a huge impact on the kind of
model that needs to be built. I would like to illustrate using a real
case how this notion of model utility impacted the kind of model that
was eventually successfully deployed within a pharmaceutical
organization.
The problem: A mid-size pharma company faced the challenge of
identifying leads that had the ability to penetrate the
blood-brain-barrier (BBB) and bind to a particular target localized in
brain. The 3-D structure of that target was such that good binders
inevitably ended up being rather large molecules that did not easily
penetrate the BBB. Molecules that seemed to satisfy permeability and
binding criteria in vitro were injected into rats and the brain-plasma
ratio (B/P) measured. Normally, molecules that have B/P > 2 are
called penetrants, those with B/P < 0.5 are non-penetrants while all
others fall into an ambiguous category. B/P measurement in animals
requires multiple time-point sacrifices to construct the brain- and the
blood-AUC ending up being expensive and time-consuming. What the
organization required was a model that would predict the B/P of their
leads to enable prioritization for the animal experiments. The twist to
this problem is that since it is difficult to design good binders that
can enter the brain, it is critical that false negatives are avoided,
i.e. the model must flag all penetrants accurately even if it
misclassifies some non-penetrants.
The Models: Using the data provided, we first built a set of two
regression models: a 20-feature model (with R2 = 0.97, Q2 = 0.895) and
a 9-feature model (R2 = 0.87, Q2 = 0.744). The statistical metrics
looked good; y-randomization also seemed to support the fact that the
signal in the data was modeled. To test whether these models would be
good for decision-making, we used the cross-validation predictions on
the training set to understand the discriminatory power of the model in
identifying penetrants. The results are shown in the table below.
|
Conclusions:
What I hope, I have convinced you of is that a model primary
measure of goodness is its utility rather than statistical measures. In
this study we used the cross-validation predictions as a measure of
model utility and it proved to be a reasonable approach. Even after one
constructs the best possible model, it is a rather dangerous approach
to use models, especially in later stages of the pipeline as a filter.
The right approach is to use models to prioritize ones next
experiments, with the belief that a “good” model will allow
to perform the right confirmatory experiment earlier. This was indeed
true for the model described above; cost savings of $425000 per 10
leads selected was estimated.
|