Mutagen/nonmutagen classification of diverse and structurally homogenous chemicals using calculated molecular descriptors: A hierarchical approach

TOXI 94

Denise Mills, dmills@nrri.umn.edu1, Subhash C. Basak, sbasak@nrri.umn.edu1, Douglas M. Hawkins, dhawkins@umn.edu2, and Brian D. Gute, bgute@nrri.umn.edu1. (1) Center for Water and the Environment, Natural Resources Research Institute, University of Minnesota, 5013 Miller Trunk Hwy, Duluth, MN 55811, (2) School of Statistics, University of Minnesota, 313 Ford Hall, 224 Church Street SE, Minneapolis, MN 55455
Ridge linear discriminant analysis was used to classify a diverse set of 508 mutagens/ non-mutagens, as well as three structurally homogenous subsets, viz., 260 monocyclic carbocycles and heterocycles, 192 polycyclic carbocycles and heterocycles, and 124 aliphatic alkanes, alkenes, and alkynes. Software programs including POLLY, Triplet, Molconn-Z, Sybyl, and MOPAC were used to calculate a large and diverse set of theoretical molecular descriptors. Subsequently, the descriptors were divided into hierarchical classes based on level of complexity and demand for computational resources. Results indicate that inclusion of the more complex descriptors does not lead to a significant increase in model quality. In addition, correct classification rates for the relatively homogeneous subsets are comparable to those obtained for the entire set of 508 diverse compounds, indicating that the diverse set of theoretical descriptors is capable of representing the diversity of structural features present in the data set.