Phase Transitions in Rate Distortion Theory and Deep Learning
Rate distortion theory is concerned with optimally encoding a given signal class ๐ฎ using a budget of R bits, as Rโโ. We say that ๐ฎ can be compressed at rate s if we can achieve an error of ๐ช(R^-s) for encoding ๐ฎ; the supremal compression rate is denoted s^โ(๐ฎ). Given a fixed coding scheme, there usually are elements of ๐ฎ that are compressed at a higher rate than s^โ(๐ฎ) by the given coding scheme; we study the size of this set of signals. We show that for certain "nice" signal classes ๐ฎ, a phase transition occurs: We construct a probability measure โ on ๐ฎ such that for every coding scheme ๐ and any s >s^โ(๐ฎ), the set of signals encoded with error ๐ช(R^-s) by ๐ forms a โ-null-set. In particular our results apply to balls in Besov and Sobolev spaces that embed compactly into L^2(ฮฉ) for a bounded Lipschitz domain ฮฉ. As an application, we show that several existing sharpness results concerning function approximation using deep neural networks are generically sharp. We also provide quantitative and non-asymptotic bounds on the probability that a random fโ๐ฎ can be encoded to within accuracy ฮต using R bits. This result is applied to the problem of approximately representing fโ๐ฎ to within accuracy ฮต by a (quantized) neural network that is constrained to have at most W nonzero weights and is generated by an arbitrary "learning" procedure. We show that for any s >s^โ(๐ฎ) there are constants c,C such that, no matter how we choose the "learning" procedure, the probability of success is bounded from above by min{1,2^Cยท Wโlog_2(1+W)โ^2 -cยทฮต^-1/s}.
READ FULL TEXT