Difference in vector-based and graph-based coding for ADME prediction

CINF 76

Joerg K. Wegner, wegnerj@informatik.uni-tuebingen.de and Andreas Zell, zell@informatik.uni-tuebingen.de. Department of Bioinformatics (ZBIT), University, Sand 1, Tuebingen, 72076, Germany
We present an extensive study to build classification and regression models using five different ADMET data sets (HIA, LogP, LogS, BBB, and two toxicological data sets causing cancer in rats and mice).

We compare especially the relevance of vector based coding for molecules using descriptors and fingerprints and a coordinate-free coding working directly on the molecular structures avoiding a temporary abstract vector representation.

We see that the vector coding can be used for large data sets by loosing accuracy and the coordinate-free approach avoids the feature selection problem, but is only applicable for smaller data sets. Furthermore we discuss shortly the underlying space and time complexities.

 

ADME/tox Informatics
2:00 PM-5:00 PM, Tuesday, 15 March 2005 Convention Center -- Room 33A, Oral

Division of Chemical Information

The 229th ACS National Meeting, in San Diego, CA, March 13-17, 2005