Combining natural language processing with substructure search for efficient mining of scientific literature

CINF 71

Shaillay Kumar Dogra, shaillay@strandls.com and Ramesh Hariharan. Cheminformatics, Strand Life Sciences Pvt. Ltd, No. 237, Sir C V Raman Avenue, Raj Mahal VIlas, Bangalore, India
Running ‘Natural Language Processing' engine on scientific literature can yield information on interactions between biological entities like proteins and small molecules. Such an approach, when run on Medline abstracts in December 2005 yielded around 231,400 Protein-Small molecule and 110,850 small molecule-small molecule interactions. Clearly, there is a plethora of information available for analysis. However, the nature of the search, which is ‘text' driven, limits such an approach. What is of immensely more use is to run a ‘substructure' search using the query compound of interest against the small molecule interactions database. The resulting hits can then be analyzed to check if the query compound has potentially similar biological interactions. This gains significance in a drug discovery setting wherein compounds are being virtually designed and optimized for good ADME properties. An additional dimension to optimize now could be avoiding undesirable interactions with specific biological targets or with other small molecules.
 

Cheminformatics Techniques in Bioinformatics: Related Applications
1:30 PM-5:20 PM, Wednesday, August 22, 2007 BCEC -- 252 A, Oral

Sci-Mix
8:00 PM-10:00 PM, Monday, August 20, 2007 BCEC -- Exhibit Hall - B2, Sci-Mix

Division of Chemical Information

The 234th ACS National Meeting, Boston, MA, August 19-23, 2007