Workflow and Pipelining in Cheminformatics
KNIME
An interesting new entrant to the market is KNIME12 (the "K" is silent), a modular data exploration platform that enables the user to create data flows visually, execute analysis steps, and later investigate the results through interactive views on data and models. It was developed by the group of Michael Berthold at the University of Konstanz, Germany. It already incorporates several hundred nodes as part of the standard release (for "nodes" you may care to read "modules" or "components") for data processing, modeling, analysis and mining, as well as various interactive views, such as scatter plots and parallel coordinates. KNIME supports the integration of different databases such as SQL, Oracle, and DB2. Thus, data from different database sources can be joined, manipulated, partitioned, and transformed in KNIME and finally stored again in databases.
KNIME is based on the Eclipse13 open source platform and is extensible through its modular Application Programming Interface (API): custom nodes and types can be integrated. KNIME is licensed under the Aladdin free public license and, in essence, is available free of charge for use in both non-profit and for-profit organizations. A research group in a pharmaceutical company, for example, can download and use KNIME freely for internal data analysis but if an organization wants to make money from distributing KNIME, KNIME will want some benefit from that. For-profit partners such as Tripos10 and Schrödinger14 are implementing the technology and also provide KNIME support options. Commercial support is seen as a vital factor in general adoption of open source tools in cheminformatics. The KNIME developers also provide support through a KNIME community that contains useful information and discussion forums.
The Schrödinger KNIME Extensions (currently in a beta version) have more than 100 nodes covering a range of computational tools for workflows focused on both ligand- and structure-based applications. Ligand-based tools include nodes for property and fingerprint generation, similarity searching, diversity analysis, clustering, conformation generation, common pharmacophore perception, database searching, and shape-based screening. For structure-based work there are nodes for docking, homology model building, structural refinement, and binding free energy estimation as well at many tools for general molecular modeling and structural manipulation. Schrödinger provides support for both the Schrödinger KNIME Extensions and for the KNIME platform itself.
The Tripos Chemistry Extensions for KNIME package is an initial offering that provides researchers with functionality to manipulate, analyze, and visualize chemical data. Benchware 3D Explorer, Benchware DataMiner, Concord, Confort, DBTranslate and AUSPYX can be licensed for access through the extensions package. In future, Tripos will be releasing further nodes that allow access to all Tripos science through the KNIME platform, and will continue to deliver its scientific capabilities through KNIME as well as through its traditional interfaces. The company will concentrate on the usability of the nodes rather than just on wrapping of functionality Commercial support for KNIME is available through Tripos. (Note that some Tripos tools are also accessible as components from SciTegic’s Pipeline Pilot.)
More recent node additions to KNIME are the cheminformatics capability provided by ChemAxon6 (implemented by Infocom, an IT business company of the Teijin Group), and the THINK modeling suite (Treweren Consultants).15 Symyx/MDL recently demonstrated prototypes of enumeration and structure searching nodes. Steinbeck’s team is helping the University of Konstanz to write CDK nodes. The Nodes4KNIME16 project is a new, open source initiative to develop independent nodes for use with KNIME.
|