Depth and Feature Learning are Provably Beneficial for Neural Network Discriminators
We construct pairs of distributions μ_d, ν_d on ℝ^d such that the quantity |𝔼_x ∼μ_d [F(x)] - 𝔼_x ∼ν_d [F(x)]| decreases as Ω(1/d^2) for some three-layer ReLU network F with polynomial width and weights, while declining exponentially in d if F is any two-layer network with polynomial weights. This shows that deep GAN discriminators are able to distinguish distributions that shallow discriminators cannot. Analogously, we build pairs of distributions μ_d, ν_d on ℝ^d such that |𝔼_x ∼μ_d [F(x)] - 𝔼_x ∼ν_d [F(x)]| decreases as Ω(1/(dlog d)) for two-layer ReLU networks with polynomial weights, while declining exponentially for bounded-norm functions in the associated RKHS. This confirms that feature learning is beneficial for discriminators. Our bounds are based on Fourier transforms.
READ FULL TEXT