QSAR WORLD
Home | About QSAR World | Strand Life Sciences | Contact Us
Google Custom Search

The IUPAC International Chemical Identifier

Take, for example, naphthalene:

  

InChI=1/C10H8/c1-2-6-10-8-4-3-7-9(10)5-1/h1-8H

In the InChI, the first “1” refers to the version of the InChI software. (Note that this will actually be “1S” in the “standard InChI” version to be released soon with version 1.02.) The next segment of the string, C10H8, provides the molecular formula. The third segment is the connection table, which indicates how the atoms are connected. The last segment provides information about the placement of hydrogen atoms. Note that the identifier does not contain any information on the double bond positions.

Where relevant, stereochemical sublayers include sp2, double bond stereochemistry, and sp3, tetrahedral stereochemistry. Relative, absolute and racemic stereoisomers are distinguished. Stereochemistry can also be entered as “unknown” or as “unspecified”. Tautomers are dealt with by hydrogen atom migration between 1,3 heteroatoms.

Extension

Currently, the InChI algorithm can handle neutral and ionic organic molecules, radicals, and inorganic, organometallic, and coordination compounds. Since InChI is composed of hierarchical layers, new layers could be added to extend the scope of the identifier. Work is currently underway to extend InChI to include polymer representation. Extensions for other forms of stereochemistry, complex organometallics, (including coordinate bonds), other compound classes such as Markush structures, macromolecules, and conformations, and other attributes such as phases and excited states may be considered later. A project to allow this work to be carried out in an open source context has been registered with SourceForge.net.15 Users are encouraged to report their experiences and any problems through the SourceForge website.

InChIKey

A beta-release of InChI version 1.02 was issued in September 2007. The principal new feature of this version was the introduction of a fixed length (25-character) condensed digital representation of the identifier known as InChIKey.16 This key will facilitate web searching, previously complicated by unpredictable breaking of InChI character strings by search engines. It will also allow development of a web-based InChI lookup service; permit an InChI representation to be stored in fixed length fields; and make chemical structure database indexing easier.

In the formal release of version 1.02, due very soon, the InChIKey will be slightly modified and will actually be 27 characters long. The first part is 14 characters long and encodes the molecular skeleton (connectivity). After a hyphen, there is a second string of 10 characters, the first eight of which encode stereochemistry and isotopes. The first 23 characters of both versions of InChIKey are the same. In the post-beta version of the InChIKey, the 10-character block ends with a flag character indicating that this is a standard InChIKey (produced out of standard InChI) and a version character indicating the version number of InChI. The key ends with a hyphen followed by a character indicating [de]protonation state.


Page 1 | 2 | 3 | 4
Have any Questions?
Name:
Email:
Enter your query/comment here
 

    Facilitated by
    Strand Life Sciences Pvt. LtdStrandls Logo