The IUPAC International Chemical
Identifier
Take, for example, naphthalene:

InChI=1/C10H8/c1-2-6-10-8-4-3-7-9(10)5-1/h1-8H
In the InChI, the first “1” refers to the version of the
InChI software. (Note that this will actually be “1S” in
the “standard InChI” version to be released soon with
version 1.02.) The next segment of the string, C10H8, provides the
molecular formula. The third segment is the connection table, which
indicates how the atoms are connected. The last segment provides
information about the placement of hydrogen atoms. Note that the
identifier does not contain any information on the double bond
positions.
Where relevant, stereochemical sublayers include sp2, double bond
stereochemistry, and sp3, tetrahedral stereochemistry. Relative,
absolute and racemic stereoisomers are distinguished. Stereochemistry
can also be entered as “unknown” or as
“unspecified”. Tautomers are dealt with by hydrogen atom
migration between 1,3 heteroatoms.
Extension
Currently, the InChI algorithm can handle neutral and ionic organic
molecules, radicals, and inorganic, organometallic, and coordination
compounds. Since InChI is composed of hierarchical layers, new layers
could be added to extend the scope of the identifier. Work is currently
underway to extend InChI to include polymer representation. Extensions
for other forms of stereochemistry, complex organometallics, (including
coordinate bonds), other compound classes such as Markush structures,
macromolecules, and conformations, and other attributes such as phases
and excited states may be considered later. A project to allow this
work to be carried out in an open source context has been registered
with SourceForge.net.15 Users are encouraged to report their
experiences and any problems through the SourceForge website.
InChIKey
A beta-release of InChI version 1.02 was issued in September 2007. The
principal new feature of this version was the introduction of a fixed
length (25-character) condensed digital representation of the
identifier known as InChIKey.16 This key will facilitate web searching,
previously complicated by unpredictable breaking of InChI character
strings by search engines. It will also allow development of a
web-based InChI lookup service; permit an InChI representation to be
stored in fixed length fields; and make chemical structure database
indexing easier.
In the formal release of version 1.02, due very soon, the InChIKey will
be slightly modified and will actually be 27 characters long. The first
part is 14 characters long and encodes the molecular skeleton
(connectivity). After a hyphen, there is a second string of 10
characters, the first eight of which encode stereochemistry and
isotopes. The first 23 characters of both versions of InChIKey are the
same. In the post-beta version of the InChIKey, the 10-character block
ends with a flag character indicating that this is a standard InChIKey
(produced out of standard InChI) and a version character indicating the
version number of InChI. The key ends with a hyphen followed by a
character indicating [de]protonation state.
|