Sparse Principal Component Analysis and Compressed Sensing: Computing Sparse Approximations to Extreme Eigenvectors

Date(s)
Monday 11th October 2010 (16:00-17:00)
Contact

Brian Logan

Description

Speaker:
Dr Peter Richtarik (http://www.maths.ed.ac.uk/~richtarik/)

Abstract:
A good data analysis tool should not only be capable of efficiently finding the "right" answer, it should also yield "interpretable" results. Consider principal component analysis (PCA). PCA is a well established tool for making sense of high dimensional data by reducing the dimension. If A is a matrix encoding p samples of n variables, with n being large, PCA aims at finding linear combinations of these variables, "principal components", that explain as much of the variance in the data as possible. An interpretable solution in PCA would use combinations of only a few variables. It is clear that there is a trade-off between statistical fidelity, or explained variance, and interpretability, or the number of variables in the solution. The goal of sparse principal component analysis ("Sparse PCA") is to explore this trade-off computationally. This talk presents a new optimisation approach to sparse PCA. While the proposed initial formulations are non-convex, and therefore computationally intractable, it is shown that a reformulation as a convex maximisation problem is possible, allowing for a simple and efficient gradient algorithm to be used. This method can be viewed as a generalization of the well-know power method for computing the maximal eigenvector of a matrix. The proposed approach outperforms existing methods both in the quality of the obtained solution and in computational speed, on a set of random problem instances and a benchmark in gene expression. Time permitting, the speaker will also elaborate on a recent extension relevant to compressed sensing: a method for the simultaneous computation of sparse approximations to both the minimal and maximal eigenvectors.

Biography:
Dr. Peter Richaterik is a lecturer in the School of Mathematics at University of Edinburgh, working in research groups focused on Optimisation and Compressed Sensing. Previously, he was a fellow at Louvain's Center for Operations Research and Econometrics, working on convex programming with Yuri Nesterov. Peter has earned his PhD in Operations Research at Cornell University, under the supervision of Mike Todd.

References:
This talk is based on "Generalized power method for sparse principal component analysis" by Michel Journee, Yurii Nesterov, Peter Richtarik and Rodolphe Sepulchre, Journal of Machine Learning Research 11:517–553, 2010 (http://jmlr.csail.mit.edu/papers/volume11/journee10a/journee10a.pdf).

For more background reading, please see "Sparse Principal Component Analysis" by Hui Zou, Trevor Hastie, and Robert Tibshirani, Journal of Computational and Graphical Statistics 2006 15(2): 262-286
(http://www-stat.stanford.edu/~hastie/Papers/spc_jcgs.pdf), and "Compressed sensing" by David Donoho, IEEE Transactions on Information Theory 2006 52 (4): 1289 - 1306 (http://dx.doi.org/10.1109/TIT.2006.871582)—and the thousands of papers that reference these.

School of Computer Science

University of Nottingham
Jubilee Campus
Wollaton Road
Nottingham, NG8 1BB

For all enquires please visit:
www.nottingham.ac.uk/enquire