Multi-Task Learning for Sparsity Pattern Heterogeneity: A Discrete Optimization Approach

12/16/2022
by   Gabriel Loewinger, et al.
0

We extend best-subset selection to linear Multi-Task Learning (MTL), where a set of linear models are jointly trained on a collection of datasets (“tasks”). Allowing the regression coefficients of tasks to have different sparsity patterns (i.e., different supports), we propose a modeling framework for MTL that encourages models to share information across tasks, for a given covariate, through separately 1) shrinking the coefficient supports together, and/or 2) shrinking the coefficient values together. This allows models to borrow strength during variable selection even when the coefficient values differ markedly between tasks. We express our modeling framework as a Mixed-Integer Program, and propose efficient and scalable algorithms based on block coordinate descent and combinatorial local search. We show our estimator achieves statistically optimal prediction rates. Importantly, our theory characterizes how our estimator leverages the shared support information across tasks to achieve better variable selection performance. We evaluate the performance of our method in simulations and two biology applications. Our proposed approaches outperform other sparse MTL methods in variable selection and prediction accuracy. Interestingly, penalties that shrink the supports together often outperform penalties that shrink the coefficient values together. We will release an R package implementing our methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2020

Variable selection in sparse GLARMA models

In this paper, we propose a novel and efficient two-stage variable selec...
research
02/13/2018

Variable Selection and Task Grouping for Multi-Task Learning

We consider multi-task learning, which simultaneously learns related pre...
research
02/02/2022

VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering

Sparse linear prediction methods suffer from decreased prediction accura...
research
09/20/2021

Variable Selection in GLM and Cox Models with Second-Generation P-Values

Variable selection has become a pivotal choice in data analyses that imp...
research
08/08/2015

The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization

We propose a novel high-dimensional linear regression estimator: the Dis...
research
06/04/2021

varycoef: An R Package for Gaussian Process-based Spatially Varying Coefficient Models

Gaussian processes (GPs) are well-known tools for modeling dependent dat...
research
11/12/2021

Accounting for data heterogeneity in integrative analysis and prediction methods: An application to Chronic Obstructive Pulmonary Disease

Epidemiologic and genetic studies in chronic obstructive pulmonary disea...

Please sign up or login with your details

Forgot password? Click here to reset