Fractal properties of representations of chemical libraries

CINF 26

Martin Grigorov, martin.grigorov@rdls.nestle.com, BioInformatics, Nestlé Research Center, PO Box 44, Canton de Vaud, Lausanne, 1000, Switzerland
There is emerging evidence that real-world datasets are statistically self-similar and thus fractal. In this work I investigate some global topological properties of representation of chemical libraries in spaces defined by molecular descriptors. New algorithms are developed and used in this work, such as the dimension reduction of such chemical data sets by singular value decomposition and the introduction of the correlation dimension as a natural dimension of a chemical space. It is shown that the representations of molecular data sets in chemical spaces possess self-similar properties, characteristic of fractal objects. This important insight allows for a compact statistical description of the datasets as well as for the inference of the number of chemically similar structures existing in the vicinity of any member of such fractal set