A Report on Fourth Joint Sheffield Conference on
Chemoinformatics June 18-20, University of Sheffield, UK
Johann Gasteiger of the University of Erlangen-Nurnberg
showed how modeling of chemical reactions can help in drug discovery.
For example, in lead discovery and lead optimization, an estimate of
synthetic accessibility can be useful. Gasteiger's team has devised a
scoring method that rapidly evaluates synthetic accessibility of
structures based on structural complexity, similarity to available
starting materials, and assessment of strategic bonds where a structure
can be decomposed to obtain simpler fragments5. These
individual components are combined to give an overall score of
synthetic accessibility by an additive scheme. The system is called SYLVIA.
Modeling metabolism is also possible. To
this end, XENIA, an in-house CYP450 database, has been developed at the
University of Erlangen-Nurnberg. MetaboGen systematically generates all
metabolites of a drug, applying a set of the most important phase I
reactions. ISOCYP is a web service for prediction of the predominant P450 isoform6. Molecular Networks also supplies the biochemical pathways database, BioPath.
Markus Wagener of Organon presented a novel,
rule-based method, SyGMa (Systematic Generation of Metabolites) that
predicts potential metabolites of a given parent structure. The method
is based on reaction rules derived from metabolic reactions that occur
in man, reported in Elsevier MDL's
Metabolite database. The database was filtered (to remove assumed
metabolites, incomplete and large structures etc.) to give 7,307
biotransformations as a training set. Reaction templates were encoded
as SMIRKS and reaction probabilities were calculated based on training
set statistics. The predicted metabolites are ranked according to the
empirical probability score. Evaluation of the method demonstrated a
significant enrichment of true metabolites at the top of the ranking
list. The current rule set covers about 70% of the human in vivo
data of the Metabolite database. To gain an understanding of the nature
of the reactions, a similarity analysis of the reaction types was
performed using difference fingerprints7 calculated by subtracting fingerprints generated from atom environments8. SPE9
was used to project the reaction space. Wagener gave some examples of
SyGMa, including the pathway for buspirone. Predictions from SyGMa are
used at Organon to plan experiments aimed at experimental metabolite
identification and to suggest labile sites amenable to optimization by
medicinal chemistry.
Metabolism was also the topic of a paper by Anton Schwaighofer of Fraunhofer FIRST. His team, idalab of Berlin, and Bayer Schering Pharma (BSP)
have jointly developed machine learning tools to predict the metabolic
stability of compounds from drug discovery projects at BSP. They used
experimental metabolic stability data from four different in vitro
assays. They compared a variety of machine learning approaches in terms
of performance, difficulty of the model selection procedures,
interpretability, and how the "domain of applicability" can be checked.
They concluded that Gaussian Process
classification has specific benefits. The effort required for model
selection is minimal, so fully automatic re-training is possible. Also,
the probabilistic output is easy to interpret and shows almost ideal
properties. Competing methods achieve similar performance, but need
more careful tuning by an expert. The models developed were validated
on recent project data at BSP: the best models are highly accurate and
are able to identify the domain of applicability correctly. These
models are fully integrated in the working environment at BSP and a
tool for automatic regular retraining of the models is currently being
implemented. A paper has been submitted to J. Chem. Inf. Model.
|