Professors Sujay Sanghavi and Constantine Caramanis, along with a colleague at UC Berkeley, have been awarded a $1.1M grant to advance the frontiers of large-scale machine learning in the era of big and noisy data.
The proposed research focuses on the development of both fundamental new theory, and algorithms, for data that lives in very high-dimensional spaces; the dimensionality renders basic statistical tricks ineffective. Applications include, For instance, the problem of designing recommender systems, such as those used by Amazon, Netflix and other on-line companies. This involves analyzing large matrices that describe users' behavior in past situations. In sociology, researchers are interested in fitting networks to large-scale data sets, involving hundreds or thousands of individuals. In medical imaging, the goal is to reconstruct complicated phenomena (e.g., brain images; videos of a beating heart) based on a minimal number of incomplete and possibly corrupted measurements.
Motivated by such applications, the goal of this research is to develop and analyze models and algorithms for extracting relevant structure from such high-dimensional data sets in a robust and scalable fashion. Robustness is a particular issue in high dimensions, and existing algorithms are especially fragile to corruption and model mismatch issues that are rife in modern large data sets.
The research leverages tools from convex optimization, signal processing, and robust statistics. It also further facilitates the recent addition of such ideas to the ECE curriculum, in the form of the two-class sequence on optimization and learning co-taught by the PIs in 2012-13.