Optimizing the Union of Intersections LASSO (UoI_LASSO) and Vector Autoregressive (UoI_VAR) Algorithms for Improved Statistical Estimation at Scale

08/21/2018
by   Mahesh Balasubramanian, et al.
8

The analysis of scientific data of increasing size and complexity requires statistical machine learning methods that are both interpretable and predictive. Union of Intersections (UoI), a recently developed framework, is a two-step approach that separates model selection and model estimation. A linear regression algorithm based on UoI, UoI_LASSO, simultaneously achieves low false positives and low false negative feature selection as well as low bias and low variance estimates. Together, these qualities make the results both predictive and interpretable. In this paper, we optimize the UoI_LASSO algorithm for single-node execution on NERSC's Cori Knights Landing, a Xeon Phi based supercomputer. We then scale UoI_LASSO to execute on cores ranging from 68-278,528 cores on a range of dataset sizes demonstrating the weak and strong scaling of the implementation. We also implement a variant of UoI_LASSO, UoI_VAR for vector autoregressive models, to analyze high dimensional time-series data. We perform single node optimization and multi-node scaling experiments for UoI_VAR to demonstrate the effectiveness of the algorithm for weak and strong scaling. Our implementations enable to use estimate the largest VAR model (1000 nodes) we are aware of, and apply it to large neurophysiology data 192 nodes).

READ FULL TEXT
research
08/29/2019

Sparse, Low-bias, and Scalable Estimation of High Dimensional Vector Autoregressive Models via Union of Intersections

Vector autoregressive (VAR) models are widely used for causal discovery ...
research
05/22/2017

Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction

The increasing size and complexity of scientific data could dramatically...
research
08/16/2016

Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets

The scale of functional magnetic resonance image data is rapidly increas...
research
11/09/2017

Interpretable Vector AutoRegressions with Exogenous Time Series

The Vector AutoRegressive (VAR) model is fundamental to the study of mul...
research
06/15/2020

FANOK: Knockoffs in Linear Time

We describe a series of algorithms that efficiently implement Gaussian m...
research
06/26/2019

Coded State Machine -- Scaling State Machine Execution under Byzantine Faults

We introduce an information-theoretic framework, named Coded State Machi...
research
04/03/2017

Loop Tiling in Large-Scale Stencil Codes at Run-time with OPS

The key common bottleneck in most stencil codes is data movement, and pr...

Please sign up or login with your details

Forgot password? Click here to reset