Spectrum of inner-product kernel matrices in the polynomial regime and multiple descent phenomenon in kernel ridge regression

04/21/2022
by   Theodor Misiakiewicz, et al.
0

We study the spectrum of inner-product kernel matrices, i.e., n × n matrices with entries h (⟨x_i ,x_j ⟩/d) where the ( x_i)_i ≤ n are i.i.d. random covariates in ℝ^d. In the linear high-dimensional regime n ≍ d, it was shown that these matrices are well approximated by their linearization, which simplifies into the sum of a rescaled Wishart matrix and identity matrix. In this paper, we generalize this decomposition to the polynomial high-dimensional regime n ≍ d^ℓ,ℓ∈ℕ, for data uniformly distributed on the sphere and hypercube. In this regime, the kernel matrix is well approximated by its degree-ℓ polynomial approximation and can be decomposed into a low-rank spike matrix, identity and a `Gegenbauer matrix' with entries Q_ℓ (⟨x_i , x_j ⟩), where Q_ℓ is the degree-ℓ Gegenbauer polynomial. We show that the spectrum of the Gegenbauer matrix converges in distribution to a Marchenko-Pastur law. This problem is motivated by the study of the prediction error of kernel ridge regression (KRR) in the polynomial regime n ≍ d^κ, κ >0. Previous work showed that for κ∉ℕ, KRR fits exactly a degree-⌊κ⌋ polynomial approximation to the target function. In this paper, we use our characterization of the kernel matrix to complete this picture and compute the precise asymptotics of the test error in the limit n/d^κ→ψ with κ∈ℕ. In this case, the test error can present a double descent behavior, depending on the effective regularization and signal-to-noise ratio at level κ. Because this double descent can occur each time κ crosses an integer, this explains the multiple descent phenomenon in the KRR risk curve observed in several previous works.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset