Development of fragment-based chemical descriptors using novel frequent common subgraph mining approach and their application in QSAR modeling

COMP 177

Raed Khashan, raed_khashan@unc.edu1, Weifan Zheng2, Jun Huan3, Wei Wang3, and Alexander Tropsha, tropsha@email.unc.edu4. (1) Laboratory for Molecular Modeling, Department of Medicinal Chemistry and Natural Products, School of Pharmacy, University of North Carolina at Chapel Hill, 301 Beard Hall (CB# 7360), Chapel Hill, NC 27599, (2) Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina at Chapel Hill, CB # 7360, Beard Hall, School of Pharmacy, Chapel Hill, NC 27599-7360, (3) Department of Computer Science, University of North Carolina at Chapel Hill, 329 Sitterson Hall, Chapel Hill, NC 27599, (4) Laboratory of Molecular Modeling, School of Pharmacy, The University of North Carolina at Chapel Hill, 301 Beard Hall, CB# 7360, UNC-CH, Chapel Hill, NC 27599
We present a novel approach to generating fragment-based molecular descriptors. Using labeled chemical graph representation of molecules, Fast Frequent Subgraph Mining (FFSM) method developed in this group is used to find chemical fragments that occur in at least a subset of molecules in a dataset. The counts of each frequent fragment have been used as descriptors in variable selection k Nearest Neighbor (kNN) QSAR modeling. This approach has been applied to Maximum Recommended Therapeutic Dose (MRTD) and Salmonella mutagenicity datasets. We followed established protocols for model validation, i.e., randomization of target property and splitting the datasets into training, test, and validation sets. Highly predictive models have been generated with the external R2 for both test and validation sets exceeding 0.70. Frequent subgraphs implicated in validated models afford mechanistic interpretation of the results in terms of essential pharmacophoric elements responsible for the compound activity.