COMP 45 |
| Structures from WOMBAT (WOrld of Molecular BioAcTivity) [1] were investigated with several descriptor systems. For 79,483 unique non-stereoisomeric compounds, we found multiple “confused“ instances: For 2D-descriptors, 314 duplicates (0.4%) across 80 descriptors (487 pairs); for MESA-implemented [2] MDL keys, 4391 duplicates (5.5%) across 320 keys; for Daylight fingerprints [3], 7166 duplicates (9.0%) for 512-keys, 5010 duplicates (6.3%) for 1024-keys, and 4092 duplicates (5.1%) at the 2048 level. The WOMBAT-derived set of 512 keys had 6202 (7.8%) duplicates. Our results indicate that, for several chemical descriptor systems, it is not always possible to provide a 1:1 map between chemical structure and chemical description. This implies that we can devise an information–rich, yet “confused” descriptor system, i.e., a chemical information exchange tool allowing for chemical structure ambiguity. [1] WOMBAT is available from http://www.sunsetmolecular.com [2] The MDL 320 keys fingerprinter is available from http://www.mesaac.com [3] The Daylight fingerprinter is available from http://www.daylight.com |
|
Safe Exchange of Chemical Information: Can Relevant Chemical Information be Exchanged Without Disclosing Chemical Structures?
1:20 PM-5:20 PM, Sunday, 13 March 2005 Convention Center -- Room 4, Oral
Division of Computers in Chemistry |