Development of scoring functions for protein-ligand binding based on frequent geometric and chemical patterns of inter-atomic interactions at their interfaces

COMP 352

Raed Khashan, raed_khashan@unc.edu, Laboratory for Molecular Modeling, Department of Medicinal Chemistry and Natural Products, School of Pharmacy, University of North Carolina at Chapel Hill, 301 Beard Hall (CB# 7360), Chapel Hill, NC 27599, Weifan Zheng, wzheng@nccu.edu, Department of Pharmaceutical Sciences, Biomanufacturing Research Institute and Technology Enterprise (BRITE), North Carolina Central University, Carolina Exploratory Center for Cheminformatics Research (CECCR), 1801 Fayetteville Street, Mary M. Townes Science Complex, Room 1256, Durham, NC 27707, and Alexander Tropsha, alex_tropsha@unc.edu, Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina at Chapel Hill, CB # 7360, Beard Hall, School of Pharmacy, Chapel Hill, NC 27599-7360.
Developing a scoring function that identifies the correct docking pose is very important in understanding the binding mode of a ligand to its receptor, and consequently, in the design of new lead compounds. In this study, we present a study for a novel knowledge-based scoring function that has been developed based on the frequent geometric and chemical patterns of inter-atomic interactions at the interface of a representative dataset of x-ray characterized protein-ligand complexes. The approach includes the following steps. First, the protein-ligand interfaces of each complex in the training set are represented by labeled chemical graphs where nodes are atoms and edges connect protein and ligand atoms within certain distance of each other. Second, subgraph mining techniques are used to find frequent subgraphs that occur in no less than a certain percentage of the complexes in the training set, and these frequent subgraphs identify the patterns that are used in the scoring function. Thus, the test protein-ligand complexes are scored based on the similarity between interaction patterns identified at the protein-ligand interface of a test protein-ligand pair to those found frequently in the training set of x-ray characterized complexes. The scoring function has been tested for its ability to accurately recognize the native pose of a ligand in the X-ray crystal structure of the protein-ligand complexes vs. non-native poses produced by computational docking. We have demonstrated that this novel scoring function affords higher accuracy of scoring than five commonly used scoring functions and their consensus provided by commercial docking software.