Tensor Switching Networks

10/31/2016
by   Chuan-Yung Tsai, et al.
0

We present a novel neural network algorithm, the Tensor Switching (TS) network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to tensor-valued hidden units. The TS network copies its entire input vector to different locations in an expanded representation, with the location determined by its hidden unit activity. In this way, even a simple linear readout from the TS representation can implement a highly expressive deep-network-like function. The TS network hence avoids the vanishing gradient problem by construction, at the cost of larger representation size. We develop several methods to train the TS network, including equivalent kernels for infinitely wide and deep TS networks, a one-pass linear learning algorithm, and two backpropagation-inspired representation learning algorithms. Our experimental results demonstrate that the TS network is indeed more expressive and consistently learns faster than standard ReLU networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2018

Learning One-hidden-layer ReLU Networks via Gradient Descent

We study the problem of learning one-hidden-layer neural networks with R...
research
05/27/2019

Equivalent and Approximate Transformations of Deep Neural Networks

Two networks are equivalent if they produce the same output for any give...
research
05/31/2023

On the Expressive Power of Neural Networks

In 1989 George Cybenko proved in a landmark paper that wide shallow neur...
research
01/29/2023

On Enhancing Expressive Power via Compositions of Single Fixed-Size ReLU Network

This paper studies the expressive power of deep neural networks from the...
research
05/23/2019

Tucker Decomposition Network: Expressive Power and Comparison

Deep neural networks have achieved a great success in solving many machi...
research
05/30/2018

How Important Is a Neuron?

The problem of attributing a deep network's prediction to its input/base...
research
11/08/2018

On the Statistical and Information-theoretic Characteristics of Deep Network Representations

It has been common to argue or imply that a regularizer can be used to a...

Please sign up or login with your details

Forgot password? Click here to reset