Assessing the effect of library design choices on model performance

COMP 50

Kiko Aumond1, Hans Wolters1, and Jennifer L. Miller2. (1) CADD, Signature BioScience, 475 Brannan Street, San Francisco, CA 94107, (2) Signature Biosciences, 475 Brannan St., San Francisco, CA 94107
The development of a productive virtual assay is critically dependent upon the data points used to train the model. Given the same type of compound descriptor (pharmacophores), we examine the size and objective function effects of the “library design” problem. Specifically, we will characterize the generalization ability of a classifier as a function of the number of compounds picked by two different library design techniques: informative and diverse.