CINF 84 |
| We give an overview of our recent work in the context of our Chemical Structure Lookup Service (CSLS). This service comprises (at the time of this writing) a collection of approx. 80 chemical structure databases from commercial and public sources, indexes approximately 40 million molecules representing approximately 27 million unique chemical structures, and continues to grow. We focus on our procedure for the normalization of the chemical structures, which is a crucial step in the processing of chemical databases coming from different sources. It is needed for finding a canonical representation of a chemical which otherwise might be missed because of differing encoding due to certain chemical features (e.g. different tautomers, different resonance structures etc.) or to ill-defined parts of the structure (e.g. misdrawn functional groups, missing hydrogen atoms, missing charges or incorrect valences). This structure normalization is performed for any incoming structure set to be registered, or searched by, in CSLS. We also discuss our structure-based hashcode identifiers, which are calculable for any small molecule. They are specifically designed to enable a fine-tunable yet rapid compound identification even in very large datasets. They can be set to be sensitive to a variety of chemical features such as tautomerism, different resonance structures drawn for a charged species, and presence or absence of certain fragments like counterions. One specific such identifier, called FICuS, is one of the crucial mechanisms for identification and lookup of chemicals in CSLS – enabling CSLS to function essentially as an “address book” of any small molecule. FICuS and the other identifiers are however not dependent on the infrastructure of this service. CSLS is freely available at http://cactus.nci.nih.gov/lookup. The service recognizes over 20 chemical structure representation formats as input data, including SD files, SMILES strings, InChI identifiers, or FICuS hashcodes. |
|
Sci-Mix
8:00 PM-10:00 PM, Monday, August 20, 2007 BCEC -- Exhibit Hall - B2, Sci-Mix
General Papers
Division of Chemical Information |