Workflow and Pipelining in Cheminformatics
Wendy Warr, editorial advisor of QSARWorld, writes on the workflow paradigm as a mechanism to integrate different data resources, softwares and alogrithms, web services etc. Read on...
Download PDF Version
The workflow paradigm is a generic mechanism to integrate different data resources, software applications and algorithms, Web services and shared expertise. Such technologies allow a form of integration and data analysis that is not limited by the restrictive tables of a conventional database system. They enable scientists to construct their own research data processing networks (sometimes called "protocols") for scientific analytics and decision making by connecting various information resources and software applications together in an intuitive manner, without any programming. Therefore, software from a variety of vendors can be assembled into something that is the ideal workflow for the end user. These are, purportedly, easy-to-use systems for controlling the flow and analysis of data. In practice, certainly in chemical analyses, they are not generally used by novices: best use of the system can be made by allowing a computational chemist to set up the steps in a protocol then "publish" it for use by other scientists on the Web.
Until recently, in the cheminformatics field, only two solutions were in common use for capturing and executing such multi-step procedures, processing entire data sets in real time through "pipelines" or "workflows". The two technologies, both of them commercial solutions, are from InforSense,1 which uses a workflow paradigm in its InforSense platform, and SciTegic2 (now part of Accelrys) which uses data pipelining in Pipeline Pilot. New entrants to the market now open up many more options.
In Pipeline Pilot, users can graphically compose protocols, using hundreds of different configurable components for operations such as data retrieval, manipulation, computational filtering, and display. There are three options for the interface: the Professional Client, the Lite Client, and the Web Port Client. SciTegic offers collections of components covering chemistry, ADME/Tox, chemically-intelligent text mining, decision trees, gene expression, materials, modeling, R statistics, reporting, imaging, sequence analysis, text analytics, and the software packages Catalyst, and CHARMm.
Spotfire3 and SciTegic have coupled Spotfire DecisionSiteís interactive visual analytics with Pipeline Pilotís data processing protocols. Researchers can embed Pipeline Pilot computations in DecisionSite (without any scripting or programming) and deploy these throughout the enterprise. DecisionSite users can run analyses in Pipeline Pilot without leaving the DecisionSite environment. Pipeline Pilot is supported on Linux and Windows and is used by over 200 pharmaceutical, biotechnology, and chemicals companies. Applications have been reported in the cheminformatics literature.4,5 SciTegic has recently announced a free academic version of Pipeline Pilot to facilitate dissemination of scientific innovations to industry.