From Hard to Soft: Understanding Deep Network Nonlinearities via Vector Quantization and Statistical Inference

10/22/2018
by   Randall Balestriero, et al.
16

Nonlinearity is crucial to the performance of a deep (neural) network (DN). To date there has been little progress understanding the menagerie of available nonlinearities, but recently progress has been made on understanding the rôle played by piecewise affine and convex nonlinearities like the ReLU and absolute value activation functions and max-pooling. In particular, DN layers constructed from these operations can be interpreted as max-affine spline operators (MASOs) that have an elegant link to vector quantization (VQ) and K-means. While this is good theoretical progress, the entire MASO approach is predicated on the requirement that the nonlinearities be piecewise affine and convex, which precludes important activation functions like the sigmoid, hyperbolic tangent, and softmax. This paper extends the MASO framework to these and an infinitely large class of new nonlinearities by linking deterministic MASOs with probabilistic Gaussian Mixture Models (GMMs). We show that, under a GMM, piecewise affine, convex nonlinearities like ReLU, absolute value, and max-pooling can be interpreted as solutions to certain natural "hard" VQ inference problems, while sigmoid, hyperbolic tangent, and softmax can be interpreted as solutions to corresponding "soft" VQ inference problems. We further extend the framework by hybridizing the hard and soft VQ optimizations to create a β-VQ inference that interpolates between hard, soft, and linear VQ inference. A prime example of a β-VQ DN nonlinearity is the swish nonlinearity, which offers state-of-the-art performance in a range of computer vision tasks but was developed ad hoc by experimentation. Finally, we validate with experiments an important assertion of our theory, namely that DN performance can be significantly improved by enforcing orthogonality in its linear filters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2023

Representing Piecewise Linear Functions by Functions with Small Arity

A piecewise linear function can be described in different forms: as an a...
research
01/17/2019

Activation Functions for Generalized Learning Vector Quantization - A Performance Comparison

An appropriate choice of the activation function (like ReLU, sigmoid or ...
research
01/30/2023

Formalizing Piecewise Affine Activation Functions of Neural Networks in Coq

Verification of neural networks relies on activation functions being pie...
research
10/09/2019

Dissecting Deep Neural Networks

In exchange for large quantities of data and processing power, deep neur...
research
07/16/2023

A max-affine spline approximation of neural networks using the Legendre transform of a convex-concave representation

This work presents a novel algorithm for transforming a neural network i...
research
05/17/2018

A Spline Theory of Deep Networks (Extended Version)

We build a rigorous bridge between deep networks (DNs) and approximation...
research
08/15/2023

Max-affine regression via first-order methods

We consider regression of a max-affine model that produces a piecewise l...

Please sign up or login with your details

Forgot password? Click here to reset