The last
day for accepting the challenge entries has been extended to 31st
January 2009, on request to accomodate the year-end vacation!
This
year’s challenge is to build a quantitative (regression)
QSAR model to predict human oral bioavailability given training set
data. The winner will be chosen on the basis of the RMSE on
the test set data.
‘Human
Oral Bioavailability’ has already left an
indelible mark in the world of cheminformatics.
Lipinski’s ‘Rule of 5’1
based on oral
absorption data, a surrogate for oral bioavailability, immediately
comes to mind. Unfortunately, Bioavailability is a far more
complex phenomenon to be boxed by this simple, yet intuitive set of
rules2. An important factor
influencing oral bioavailability
in addition to the absorption is the liver first-pass effect
(metabolism).
Probably due to the complexity of the phenomenon, QSAR models to
predict human oral bioavailability are fewer than what one might see
for other ADME properties. Literature
shows more attempts at building qualitative (binned) models, noteworthy
among them being Yoshida and Topliss’3
classification
model. Other attempts include Genetic programming and
adaptive fuzzy partitioning4,5. A
regression model using 85
structural descriptors was shown to be better than Lipinski’s
rule of 5 in false positive predictions; but has a mean error of ~12%
in the experimental data used to build the model6.
As with most other properties, availability of reliable data is a big
issue in bioavailability too. In this aspect, it is a
noteworthy effort by Dr. Tingjun Hou and his group7
who have
compiled and made public ADME databases. One among these is
human oral bioavailability data on 805 compounds along with their
source. We gratefully acknowledge the ADME team behind these
databases for giving us permission to make this data available for the
modeling challenge.
In-spite of the deterrent that bioavailability is too complex a
phenomenon to be captured by a qsar model, we have still chosen it for
the modeling challenge for various reasons. The
last decade has seen lot of research on molecular descriptors which can
encode complex molecular processes8.
Machine learning methods
are now less and less of black-box approaches and interpreting
‘qsar models’ has become as important as building
qsar models. With all these advances, it will be
a worth while effort to re-look at this phenomenon to understand what
can be achieved; or more importantly some insights on what we need to
achieve our goal of predicting human oral bioavailability.
With this aim, we are throwing open our challenge to build a
quantitative (Regression) QSAR model to predict human oral
bioavailability, given a set of training compounds.
The best model is judged on the basis of RMSE on the test set
data. Participants are requested to download the test set
data, run the model on this set and send us the predictions.
We will compute the RMSE on the test set based on the actual value and
adjudge the best model.
The best model builder will be awarded a gift voucher worth $200 from
www.amazon.com which can be used to buy any of their products
online. The next two best entries will receive token gifts
from Strand Life
Sciences
Users are free to use any software
/
platform / tool for generating descriptors and building models, albeit
with scientific validity. Users can also download a limited
free
license version of Sarchitect
DesignerTM, a model building tool
by Strand Life Sciences
Pvt. Ltd.