Virtual screening of chemical libraries based on predictive QSAR methods

COMP 203

Mei Wang, meiwang@email.unc.edu1, Shuxing Zhang1, Scott Oloff1, Alexander Golbraikh, golbraik@email.unc.edu1, Harold Kohn2, and Alexander Tropsha, alex_tropsha@unc.edu1. (1) Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina at Chapel Hill, CB # 7360, Beard Hall, School of Pharmacy, Chapel Hill, NC 27599-7360, (2) Division of Medicinal Chemistry and Natural Products, University of North Carolina at Chapel Hill, Chapel Hill, NC 66047
QSAR models have been developed for 91 structurally diverse anticonvulsant compounds using two approaches: variable selection k nearest neighbor (kNN) method and a new statistical modeling methodology based on lazy learning theory (termed ALL-QSAR). All models have been extensively validated by splitting this dataset into multiple training, test, and independent validation sets and demonstrating high predictive power of the training set models for both test and validation sets. Predictive models have been applied for virtual screening of a combined chemical database including over 3 million compounds. 5 new structural classes of compounds unaccounted for in the training set have been identified. The list of computational hits included structures that were very similar or identical to known anticonvulsants not included in the training set. The results of this study suggest that virtual screening of chemical databases or molecular libraries using validated QSAR models presents a new powerful paradigm for computational drug discovery.