QSAR WORLD
Home | About QSAR World | Strand Life Sciences | Contact Us
Google Custom Search

Information : Modeling Challenge 2008

The last day for accepting the challenge entries has been extended to 31st January 2009, on request to accomodate the year-end vacation!

This year’s challenge is to build a quantitative (regression) QSAR model to predict human oral bioavailability given training set data.  The winner will be chosen on the basis of the RMSE on the test set data. 

Contents:
  1. Why ‘Human Oral Bioavailability’?
  2. Who wins what?
  3. Data
  4. Software and Tool
  5. What to send back to us and when
  6. Rules
  7. Some tips
  8. References

Why ‘Human Oral Bioavailability’?

‘Human Oral Bioavailability’ has already left an indelible mark in the world of cheminformatics.   Lipinski’s ‘Rule of 5’1 based on oral absorption data, a surrogate for oral bioavailability, immediately comes to mind.  Unfortunately, Bioavailability is a far more complex phenomenon to be boxed by this simple, yet intuitive set of rules2.  An important factor influencing oral bioavailability in addition to the absorption is the liver first-pass effect (metabolism).

Probably due to the complexity of the phenomenon, QSAR models to predict human oral bioavailability are fewer than what one might see for other ADME properties.  Literature   shows more attempts at building qualitative (binned) models, noteworthy among them being Yoshida and Topliss’3 classification model.  Other attempts include Genetic programming and adaptive fuzzy partitioning4,5.  A regression model using 85 structural descriptors was shown to be better than Lipinski’s rule of 5 in false positive predictions; but has a mean error of ~12% in the experimental data used to build the model6

As with most other properties, availability of reliable data is a big issue in bioavailability too.  In this aspect, it is a noteworthy effort by  Dr. Tingjun Hou and his group7 who have compiled and made public ADME databases.  One among these is human oral bioavailability data on 805 compounds along with their source.  We gratefully acknowledge the ADME team behind these databases for giving us permission to make this data available for the modeling challenge.

In-spite of the deterrent that bioavailability is too complex a phenomenon to be captured by a qsar model, we have still chosen it for the modeling challenge for various reasons.   The last decade has seen lot of research on molecular descriptors which can encode complex molecular processes8.  Machine learning methods are now less and less of black-box approaches and interpreting ‘qsar models’ has become as important as building qsar models.  With all these advances, it will be
a worth while effort to re-look at this phenomenon to understand what can be achieved; or more importantly some insights on what we need to achieve our goal of predicting human oral bioavailability.  With this aim, we are throwing open our challenge to build a quantitative (Regression) QSAR model to predict human oral bioavailability, given a set of training compounds. 


Who wins what?

The best model is judged on the basis of RMSE on the test set data.  Participants are requested to download the test set data, run the model on this set and send us the predictions.  We will compute the RMSE on the test set based on the actual value and adjudge the best model.

The best model builder will be awarded a gift voucher worth $200 from www.amazon.com which can be used to buy any of their products online.  The next two best entries will receive token gifts from Strand Life Sciences


Data

  • Training set data as sdf along with the bioavailability values is available here >> Download
  • Test set data as sdf is available here >> Download
  • Training set data as SMILES file is available here >> Download
  • Test set data as SMILES file is available here >> Download
  • Only the bioavailability values along with the identifiers are available as text file here for the training set >> Download
  • Chemical descriptors for the training set computed by Sarchitect DesignerTM is available here as text file >> Download
  • Chemical descriptors for the test set computed by Sarchitect DesignerTM is available here as text file >> Download
  • Project files for the training set created using Sarchitect DesignerTM is available here >> Download
  • Project files for the test set created using Sarchitect DesignerTM is available here >> Download


Software and Tool


Users are free to use any software / platform / tool for generating descriptors and building models, albeit with scientific validity.  Users can also download a limited free license version of Sarchitect DesignerTM, a model building tool by Strand Life Sciences Pvt. Ltd.

Read about Sarchitect DesignerTM>>


Page  1 | 2
Have any Questions?
Name:
Email:
Enter your query/comment here
 

    Facilitated by
    Strand Life Sciences Pvt. LtdStrandls Logo