Spectral clustering of chemical datasets

CINF 20

Rajarshi Guha, rguha@indiana.edu and David J Wild, djwild@indiana.edu. School of Informatics, Indiana University, 1130 Eigenmann Hall, 1900 E 10th Street, Bloomington, IN 47406
Spectral clustering utilizes matrix decompositions to transform a dataset of n-dimensions to a lower dimensional subspace within which clustering can be performed. The most common decomposition used is the SVD and it has been shown that the SVD of a data matrix represents a clustering. We investigate the use this approach in the clustering of an Ames mutagenecity dataset and an aqueous solubility dataset. We also investigate the use of the fast SVD algorithm which approximates the SVD of a matrix. Our results indicate that the approximation algorithm leads to an order of magnitude speedup. Furthermore the clustering results are similar to those obtained using traditional patritional clustering algorithms.