Thank you all for your participation in the modeling challenge.
The winner is the team with two members - Dr Damjan Krstajic and Prof David E Leahy, with the lowest RMSE of 29.9962.
The runner-up is Dr Anthony Klon with a very close RMSE of 30.9716.
Congratulations!
The winner will get a gift voucher from qsarworld towards purchase of
$200 worth of books at amazon.com, apart from a certificate. The
runner-up will receive mementoes from QSARworld in addition to a
certificate. All participants will receive a participation
certificate from qsarworld.com.
As warned while announcing the challenge, it was not an easy job to
build a satisfactory QSAR model for bioavailability. For the sake
of comparison, the lowest RMSE for the aqueous solubility model, which
was QSARworld’s 2007 modeling challenge was around 24 compared to
around 30 for bioavailability! The reasons are quite a few; to
quote from this recent review “Structure-ADME relationship: still
a long way to go?”, :”For the ADME properties involving
complex phenomena, such as bioavailability, the in silico models
usually cannot give satisfactory predictions. Moreover, the lack of
large and high-quality data sets also greatly hinder the reliability of
ADME predictions.” A netscience article gives some insight
into the complex phenomenon and some references on how they can be
captured. Still, the critical issue of data remains!
Details of the first two winning entries
Winner :
Damjan Krstajic Director, Research Centre for Cheminformatics in Belgrade, Serbia
and
Prof. David E Leahy Northern Institute for Cancer Research in Newcastle, UK
Details:
Methodology
We used Discovery Bus (an automated system for QSAR modeling) to find
the best QSAR model for the bioavailability data. Unfortunately, for
the supplied dataset we were not able to find any good model. Our best
model so far is a neural net using over 160 descriptors(logD, mw, psa,
tpsa,etc together with e-states descriptors). The more detailed
information about descriptors can be supplied upon request. There were
6 compounds that were "problematic", i.e. we had difficulties to
calculate the descriptors for them. They were molecules 394, 623, 390,
307, 209 and 508. We had to give predictions for them and we used the
average value between 0 and 100, i.e. 50.
From QSARworld:
The test set predictions for this model gives a RMSE of 29.9962, the lowest among all the entries.