A natural amino acid classification scheme derived from multiple sequence alignments

COMP 264

Eric B. Fauman, Eric.Fauman@pfizer.com, Computer Assisted Drug Discovery, Pfizer Global Research & Development, 2800 Plymouth Rd, Ann Arbor, MI 48105
A protein structure defines a series of functional requirements necessary for generating that structure. A multiple sequence alignment provides a number of solutions to these requirements. By comparing thousands of positions across hundreds of multiple sequence alignments, one can derive a small number of discrete functional classes, and a fitness of each amino acid for each class. Bayesian inference from these classes reproduces standard substitution matrices when applied to pair-wise amino acid comparisons, but extends naturally to conservation analysis of multiple residues observed at a specific position in a multiple sequence alignment.