Genome scale enzyme-metabolite and drug-target interaction predictions using support vector machines

CINF 58

Jean-Loup Faulon, jfaulon@sandia.gov, Computational Bioscience Dept, Sandia National Laboratories, P.O. Box 5800 - MS 1413, Albuquerque, NM 87185
Biological and chemical databases are increasingly populated with information linking protein sequences and chemical structures (Kegg, PubChem DrugBank, MDDR). There is now sufficient information to apply machine learning techniques to predict interactions between chemicals and proteins on a genome-wide scale. Current machine learning techniques use as input either protein or chemical information. A novel Support Vector Machine method will be presented for predicting protein-chemical interaction using heterogeneous input consisting of both sequences and chemical structures. The method relies on fusing protein sequence data with chemical structure data by representing each with a common cheminformatics description. The approach will be demonstrated by predicting proteins that can catalyze reactions, even when the reactions have no known enzymatic catalysts, and predicting when a given drug can bind a target, also in the absence of prior binding information for that drug and target.