Training pKa and logP prediction

COMP 273

Jozsef Szegezdi and Ferenc Csizmadia. ChemAxon Kft, Maramaros koz 3/a, 1037 Budapest, Hungary
pKa and logP prediction methods are based only on a limited number of molecule types in the training set. The accuracy of these models is not always satisfactory. Practically in most cases only those types of structures will be predicted correctly which were present in the training set. We decided to develop a training method for the pKa and the logP calculations to allow users to build models relevant for their structures.

The identification of acidic and basic ionization centers is defined in our default pKa prediction modul. 120 predefined atom types are implemented in the logP prediction model. The learning algorithm is based on a linear regression method called as Single Value Decomposition (SVD). The training set, a collection of experimental pKa or logP values, should be provided by the user. The collected data should be imported as an SDF or MRV file, which can be compiled for example using Instant JChem.

The training algorithm of pKa prediction creates a correction library containing correction values for interacting functional groups. In the case of logP prediction, a full set of atomic contributions is calculated.


Poster Session
6:00 PM-8:00 PM, Tuesday, August 18, 2009 Walter E. Washington Convention Center -- Ballroom A, Poster

Division of Computers in Chemistry

The 238th ACS National Meeting, Washington, DC, August 16-20, 2009