Spectral clustering of chemical datasets

CINF 20

Rajarshi Guha, rajarshi.guha@gmail.com, NIH Chemical Genomics Center, Room 3005, 9800 Medical Center Drive, Rockville, MD 20850 and David J Wild, djwild@indiana.edu, School of Informatics, Indiana University, Bloomington, IN 47408.
Spectral clustering utilizes matrix decompositions to transform a dataset of n-dimensions to a lower dimensional subspace within which clustering can be performed. The most common decomposition used is the SVD and it has been shown that the SVD of a data matrix represents a clustering. We investigate the use this approach in the clustering of an Ames mutagenecity dataset and an aqueous solubility dataset. We also investigate the use of the fast SVD algorithm which approximates the SVD of a matrix. Our results indicate that the approximation algorithm leads to an order of magnitude speedup. Furthermore the clustering results are similar to those obtained using traditional patritional clustering algorithms.