Deriving Neural Network Design and Learning from the Probabilistic Framework of Chain Graphs

by   Yuesong Shen, et al.

The last decade has witnessed a boom of neural network (NN) research and applications achieving state-of-the-art results in various domains. Yet, most advances on architecture and learning have been discovered empirically in a trial-and-error manner such that a more systematic exploration is difficult. Their theoretical analyses are limited and a unifying framework is absent. In this paper, we tackle this issue by identifying NNs as chain graphs (CGs) with chain components modeled as bipartite pairwise conditional random fields, and feed-forward as a form of approximate probabilistic inference. We show that from this CG interpretation we can systematically formulate an extensive range of the empirically discovered results, including various network designs (e.g., CNN, RNN, ResNet), activation functions (e.g., sigmoid, tanh, softmax, (leaky) ReLU) and regularizations (e.g., weight decay, dropout, BatchNorm). Furthermore, guided by this interpretation, we are able to derive "the preferred form" of residual block, recover the simple yet powerful IndRNN model and discover a new stochastic inference procedure: the partially collapsed feed-forward inference. We believe that our work can provide a well-founded formulation to analyze the nature and design of NNs, and can serve as a unifying theoretical framework for deep learning research.


page 1

page 2

page 3

page 4


Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions

We propose a new type of neural networks, Kronecker neural networks (KNN...

A Framework for the construction of upper bounds on the number of affine linear regions of ReLU feed-forward neural networks

In this work we present a new framework to derive upper bounds on the nu...

Noise-Resilient Designs for Optical Neural Networks

All analog signal processing is fundamentally subject to noise, and this...

Expectation propagation: a probabilistic view of Deep Feed Forward Networks

We present a statistical mechanics model of deep feed forward neural net...

Using activation histograms to bound the number of affine regions in ReLU feed-forward neural networks

Several current bounds on the maximal number of affine regions of a ReLU...

A Neural Network Based on First Principles

In this paper, a Neural network is derived from first principles, assumi...

PUFFIN: A Path-Unifying Feed-Forward Interfaced Network for Vapor Pressure Prediction

Accurately predicting vapor pressure is vital for various industrial and...

Please sign up or login with your details

Forgot password? Click here to reset