Searching the impossible: Feature trees in fragment space


Marcus Gastreich, Sally Ann Hindle, and Christian Lemmen. BioSolveIT GmbH, An der Ziegelei 75, 53757 Sankt Augustin, Germany
FTrees is known to be an effective program for similarity searching. Based on the feature tree descriptor, the similarity of two molecules is defined as the score for the best possible alignment of the respective compared trees. Thanks to the tree nature, this optimal alignment can be computed very efficiently.

The step beyond simple A to B similarity calculations is an on-the-fly assembly of molecule B from a fragment space such that virtual molecule B is most similar to A. Due to the combinatorial nature of the problem, the size of this fragment search space is roughly 10^18 compounds - which is impossible to search sequentially.

We report on the generation of fragment spaces, technology to efficiently search them, and example applications. Thanks to the availability of the underlying Feature Trees as a Python module, the entire process can be scripted and executed in parallel within an integrated Python environment.