Divide and Conquer Kernel Ridge Regression: A Distributed Algorithm with Minimax Optimal Rates

05/22/2013
by   Yuchen Zhang, et al.
0

We establish optimal convergence rates for a decomposition-based scalable approach to kernel ridge regression. The method is simple to describe: it randomly partitions a dataset of size N into m subsets of equal size, computes an independent kernel ridge regression estimator for each subset, then averages the local solutions into a global predictor. This partitioning leads to a substantial reduction in computation time versus the standard approach of performing kernel ridge regression on all N samples. Our two main theorems establish that despite the computational speed-up, statistical optimality is retained: as long as m is not too large, the partition-based estimator achieves the statistical minimax rate over all estimators using the set of N samples. As concrete examples, our theory guarantees that the number of processors m may grow nearly linearly for finite-rank kernels and Gaussian kernels and polynomially in N for Sobolev spaces, which in turn allows for substantial reductions in computational cost. We conclude with experiments on both simulated data and a music-prediction task that complement our theoretical results, exhibiting the computational and statistical benefits of our approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2016

Kernel Ridge Regression via Partitioning

In this paper, we investigate a divide and conquer approach to Kernel Ri...
research
10/24/2016

Parallelizing Spectral Algorithms for Kernel Learning

We consider a distributed learning approach in supervised learning for a...
research
05/24/2021

Uncertainty quantification for distributed regression

The ever-growing size of the datasets renders well-studied learning tech...
research
09/10/2023

Nonlinear Granger Causality using Kernel Ridge Regression

I introduce a novel algorithm and accompanying Python library, named mlc...
research
06/23/2021

ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions

We introduce ParK, a new large-scale solver for kernel ridge regression....
research
03/27/2020

Distributed Kernel Ridge Regression with Communications

This paper focuses on generalization performance analysis for distribute...
research
03/22/2019

One-shot distributed ridge regression in high dimensions

In many areas, practitioners need to analyze large datasets that challen...

Please sign up or login with your details

Forgot password? Click here to reset