Fourth Joint Sheffield Conference
on Chemoinformatics - a Report
June 18-20, University of Sheffield, UK
Dr. Wendy Warr, Editorial Advisor and columnist of QSAR World reports on the recently concluded IV Joint Sheffield Conference on Chemoinformatics from University of Sheffield, UK. Over to Wendy...
Download PDF Version
The conference, sponsored by the Chemical Structure Association Trust and the Molecular Graphics and Modelling Society and organized by the University of Sheffield chemoinformatics research group, takes place every three years in the year preceding the Noordwijkerhout International Chemical Structures Conference. This fourth conference was attended by 230 delegates, the number being set by the size of the dining facilities at Chatsworth House, the site of the superb conference outing where we were shown some of the state rooms and enjoyed an excellent dinner. It is clear that the meeting has become very popular since all places were taken by the end of the early registration period, with delegates coming from Australia, Austria, Belgium, Cyprus, Denmark, France, Germany, Hungary, India, Iran, Ireland, Italy, Netherlands, Poland, Serbia, Spain, Sweden, Switzerland, the Ukraine, the United Kingdom and the United States.
Twenty-four papers were presented, in sessions entitled structure-based design, new algorithms and techniques, deriving structure-activity relationships, clustering, and QSAR and ADMET. More than sixty posters were presented. In this report I am summarizing only the QSAR-related papers, which means I am obliged to omit some of the material that I myself found most interesting. It is a shame to have to ignore, for example, the excellent paper by Andy Good on the defects of enrichment studies in the comparison of virtual screening (i.e. docking) tools. It is unfortunate that I have to gloss over controversial comments from Anthony Nicholls ("docking sucks", and "you cannot calculate binding energy") and his attack on Richards and Ballester’s Ultrafast Shape Recognition. Indeed, Nicholls’ own paper was controversial in itself.
Nicola Richmond of GlaxoSmithKline presented a fast, novel, graph-matching algorithm, based on the comparison of distance degree sequences. The algorithm matches pairs of nodes, one from each graph, by solving the linear assignment problem. The graph similarity is then given by the minimum cost associated with the optimal set of matching pairs of nodes. By representing molecules as 2D topological pharmacophores, Richmond has adapted the algorithm to rank a corporate collection against a query molecule of interest, and to cluster the ranked list into groups of compounds that have identical chemical graphs. The clustering component has a useful visualization facility. The highest ranked compounds correspond to the analogues of the query; families of "lead hops" follow. This unsupervised approach is not a substitute for substructure search but it is fast and it may produce a new template around which a chemist can search. It can follow GSK’s automated high throughput screening process to recover not only families of compounds on which to build structure activity relationships, but also hits missed by high throughput screening (HTS).
Enriched scaffolds in HTS data sets can be identified by clustering on substructure and then extracting the maximal common substructure (MCS) for each cluster. However, if clustering is performed without reference to the assay data, the resulting scaffolds are unlikely to show optimal enrichment for the assay in question. Martin Packer of AstraZeneca has developed a method for locating scaffolds with high enrichment factors, using a hierarchical search strategy. Molecules encoded by substructure are partitioned into N clusters and for each cluster, M hierarchical clusters are generated. The MCS is extract from each cluster and an enrichment factor is computed. The enrichment factor is calculated for each maximal common substructure. The procedure is iterated by setting M: = M-1. The method was applied to a 540,000-compound in-house kinase data set and 6,737 actives were partitioned into 200 clusters. The AstraZeneca collection contains lots of kinase series; so a Bonferroni test was applied to correct for the chance of generating a spurious result. The hierarchical nature of the search means that structure-activity relationships emerge for the most enriched scaffolds. Emergent SAR was found for a quinazoline scaffold: substitution at the 7-position enhanced enrichment.