COMP 404 |
| In Quantitative Structure-Activity Relationship (QSAR) modeling, limited availability of data is a problem. The modeler needs to learn, validate as well as test the models on the given data. Ideally, for final testing of the models, as large an external test set as possible should be used. However, this is a compromised aspect of QSAR modeling. Here, we present a methodology wherein we first use permutation tests to determine ‘signal' and ‘noise' levels for the given data. We do so by learning QSAR models for ‘true' and ‘randomized' data. Having determined the ‘separation' that exists between the ‘signal' and ‘noise' for the given data, we next try to achieve a similarly significant separation with as minimum data as possible. We illustrate our empirical approach on cheminformatics datasets, demonstrating how equally accurate models can be learnt using minimal data with the advantage that model testing is now on a larger set. |
|
Drug Discovery
1:00 PM-3:55 PM, Wednesday, August 22, 2007 BCEC -- 161, Oral
Division of Computers in Chemistry |