Probing information content in QSAR analyses using the signature molecular descriptor

COMP 52

Jean-Loup Faulon1, Shawn Martin1, Donald P Visco2, and Archana Kotu2. (1) Computational Biology, Sandia National Laboratories, P.O. Box 969, MS 9951, Livermore, CA 94551, (2) Department of Chemical Engineering, Tennessee Tech. University, Box 5013, Cookeville, TN, 38505
The signature molecular descriptor, recently introduced (JMGM 2002 and JCICS 2003) is based on extended valence sequence and belongs to the class of fragmental descriptors including holograms, molecular subgraphs, and tree fingerprints. Like other fragmental descriptors, signature performs well in QSAR analyses. Yet, signature appears to be the only descriptor from which molecular structures can be reconstructed. In opposition to other 2D descriptors, we find that degeneracy can fully be controlled with signature, or, in other words, the number of molecular structures matching a given descriptor value can be limited by varying the signature height. Thus, signature is particularly suited for studying information content in QSAR analyses. We present here the effect of increasing the signature information content, via increasing the signature height, in two series of QSAR analyses; one for log P prediction and a second for HIV-1 protease inhibitors binding affinities calculations.