Building predictive models from high-throughput screening data


Paul E. Blower, Kevin P. Cross, Glenn J. Myatt, and Chihae Yang. Leadscope, Inc, Columbus, OH 43215
Predictive models derived from high-throughput screening (HTS) data can be useful for prioritizing compounds for further testing. However, HTS data is typically of poor quality with many values out of range and wide variability among replicate test results. Structure-based clustering often reveals an irregular landscape, both in terms of the compound classes represented and the distribution of active compounds across structural classes. Large regions of the chemical space are devoid of activity. Indeed most compounds are not active and not similar to active compounds and thus are of marginal value for modeling activity. Even among active classes, the within class active / inactive ratio may still be very unbalanced, or the range of response values too narrow, or classes are too small to derive accurate models. This presentation will survey problems of building predictive models from large, heterogeneous screening sets and describe methods for addressing them.