Fingerprint-based virtual screening using multiple reference structures

CINF 86

Jérôme Hert, hert@cgl.ucsf.edu1, Peter Willett, p.willett@sheffield.ac.uk2, David J. Wilton2, Pierre Acklin3, Kamal Azzaoui4, Edgar Jacoby3, and Ansgar Schuffenhauer3. (1) Department of Pharmaceutical Chemistry, University of California, San Francisco, 1700 4th Street, QB3 Building Room 509, San Francisco, CA 94143-2550, (2) Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN, United Kingdom, (3) Discovery Technologies, Novartis Institute for Biomedical Research, Basel, CH-4002, Switzerland, (4) Lead Finding Platform, Novartis Institutes for BioMedical Research, Lichstrasse, Basel, 4002, Switzerland
Fingerprint-based similarity searching is widely used for virtual screening when only a single bioactive reference structure is available. This paper considers similarity approaches that can be used when multiple, structurally heterogeneous reference structures are available. Extensive simulated virtual screening searches on the MDL Drug Data Report database suggest that the best results come from data fusion, specifically fusing the similarity scores for similarity searches using individual reference molecules, and an approximate form of the binary kernel discrimination technique. A detailed comparison was then carried out using these two approaches with 14 different types of 2D fingerprint, evaluating the experiments in terms of both active molecules retrieved and chemotypes retrieved. The results demonstrate the effectiveness of fingerprints that encode circular substructure descriptors generated using the Morgan algorithm. The combination of these fingerprints with data fusion based on similarity scores would seem to provide both an effective and an efficient approach to virtual screening in lead-discovery programmes.