Similarity based assessment of model applicability domain and quantitative evaluation of the reliability of the prediction

COMP 271

Pranas Japertas, jurgutis@ap-algorithms.com1, Andrius Sazonovas, andrius@pharma-algorithms.com1, Remigijus Didziapetris, remis@pharma-algorithms.com2, and Alanas Petrauskas, jurgutis@ap-algorithms.com3. (1) Faculty of Chemistry, Vilnius University, Naugarduko g. 24, Vilnius, LT-03225, Lithuania, (2) Pharma Algorithms, Inc, A.Mickeviciaus g. 29, LT-08117 Vilnius, Lithuania, (3) Pharma Algorithms Inc, 2700-161 Bay St. TD Tower, Toronto, ON M5J 2S1, Canada
Development of a methodology for the evaluation of Model Applicability Domain is presented using similarity analysis of the compounds in the training set. A novel methodology relying on the fact that any empirical in silico model works only for similar compounds was developed. The availability of similar compounds in the training set and experimental data consistency for such compounds was pivotal. This information is reflected in a corresponding Reliability Index (RI), which generates values from 0 (not reliable) to 1 (very reliable), assisting in interpretation of the results. The methodology is illustrated with examples of its application in estimating Model Applicability Domain for the models of logP, logD, solubility and toxicity. The reliability index is shown to be closely related to the overall quality of any given prediction that is represented by a clear correlation of the RI and RMSE values.