Optimal Rates of Distributed Regression with Imperfect Kernels

06/30/2020
by   Hongwei Sun, et al.
0

Distributed machine learning systems have been receiving increasing attentions for their efficiency to process large scale data. Many distributed frameworks have been proposed for different machine learning tasks. In this paper, we study the distributed kernel regression via the divide and conquer approach. This approach has been proved asymptotically minimax optimal if the kernel is perfectly selected so that the true regression function lies in the associated reproducing kernel Hilbert space. However, this is usually, if not always, impractical because kernels that can only be selected via prior knowledge or a tuning process are hardly perfect. Instead it is more common that the kernel is good enough but imperfect in the sense that the true regression can be well approximated by but does not lie exactly in the kernel space. We show distributed kernel regression can still achieves capacity independent optimal rate in this case. To this end, we first establish a general framework that allows to analyze distributed regression with response weighted base algorithms by bounding the error of such algorithms on a single data set, provided that the error bounds has factored the impact of the unexplained variance of the response variable. Then we perform a leave one out analysis of the kernel ridge regression and bias corrected kernel ridge regression, which in combination with the aforementioned framework allows us to derive sharp error bounds and capacity independent optimal rates for the associated distributed kernel regression algorithms. As a byproduct of the thorough analysis, we also prove the kernel ridge regression can achieve rates faster than N^-1 (where N is the sample size) in the noise free setting which, to our best knowledge, are first observed and novel in regression learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2023

On the Optimality of Misspecified Kernel Ridge Regression

In the misspecified kernel ridge regression problem, researchers usually...
research
08/07/2017

Learning Theory of Distributed Regression with Bias Corrected Regularization Kernel Network

Distributed learning is an effective way to analyze big data. In distrib...
research
02/09/2017

Fixing an error in Caponnetto and de Vito (2007)

The seminal paper of Caponnetto and de Vito (2007) provides minimax-opti...
research
07/13/2021

Oversampling Divide-and-conquer for Response-skewed Kernel Ridge Regression

The divide-and-conquer method has been widely used for estimating large-...
research
03/27/2020

Distributed Kernel Ridge Regression with Communications

This paper focuses on generalization performance analysis for distribute...
research
06/03/2022

Learning "best" kernels from data in Gaussian process regression. With application to aerodynamics

This paper introduces algorithms to select/design kernels in Gaussian pr...
research
09/26/2018

Generalization Properties of hyper-RKHS and its Application to Out-of-Sample Extensions

Hyper-kernels endowed by hyper-Reproducing Kernel Hilbert Space (hyper-R...

Please sign up or login with your details

Forgot password? Click here to reset