The IUPAC International Chemical
Identifier
Both
parts of the InChIKey are
based on a truncated SHA-256 hash17 of the
corresponding InChI
layers. For encoding of the data, only uppercase ASCII letters are used
which
ensures that the indexing engines will not split the data and also
avoids
case-sensitivity problems. There
is a finite, but extremely small probability
of finding two structures with the same InChIKey.
An example will make the
structure of the key clearer. The “standard InChI”
and InChIKey for caffeine are
shown below. The first block of 14 letters (RYYVLZVUVIJVGH) encodes the
molecular
skeleton (connectivity). The first eight letters of the second block
(UHFFFAOY)
encode stereochemistry and isotopes. After that,
“S” indicates that the key was
produced from standard InChI and “A” indicates that
version 1 of InChI was
used. The final character, “N”, means
“neutral”.

InChI = 1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
InChIKey = RYYVLZVUVIJVGH-UHFFFAOYSA-N
Use of InChIKey allows searches based solely on atom connectivity (the
first 14 characters). For example, the stereoisomers D-fructose and
L-fructose both have the same first block of 14 characters,
BJHIKXHVCXFQLS.
Generating
InChI
The PubChem Server Side Structure Editor v1.8 includes a facility for
generating InChIs as the user draws the structure.18
ACD/Labs’ freely available structure-drawing program
ChemSketch19 includes the facility to generate InChIs from drawn
structures. Other structure drawing packages (MDL Draw, BKChem,
ChemDraw, and Marvin) also allow an input chemical structure to be cut
and pasted into the InChI Generator. ChemSpider provides methods to
manipulate InChI strings and InChIKeys, including conversion to and
from the molfile format, checking validity of the InChI identifiers,
and searching ChemSpider using an input InChI.20
Some
Other Identifiers
Readers will no doubt be familiar with CAS Registry Numbers.21 InChI is
not a registry system; it does not depend on the existence of a
database of unique substance records to establish the next available
sequence number for any new chemical substance being assigned an InChI.
Registry systems which index the literature are complementary to any
InChI databases that anyone creates. The Simplified Molecular Input
Line Entry System (SMILES) language22 is another well known way of
representing a chemical structure by a string of characters. Like
InChI, SMILES allows canonicalization of a structure, but SMILES is
proprietary and not an open project. This has led to the use of
different generation algorithms, and thus, different SMILES versions of
the same compound have been found.10
|