Sub-quadratic search for significant correlations

22nd November 2016

Speaker: Graham Cormode, University of Warwick

Randomized algorithms have been very helpful for data reduction, with successful applications in log analysis and dimensionality reduction for machine learning.  These techniques have more recently been extended to problems in linear algebra. A basic problem in data analysis is to find correlations within multiple time series: consider a set of observations of values, such as the prices of a large collection of stocks over time.  Finding which pairs have high (linear) correlation would usually take time proportional to the square of the number of series.  By combining dimensionality reduction and hashing ideas, we can reduce this cost, albeit with some rather optimistic assumptions about the non-correlated signals.