Trellis Networks for Sequence Modeling

10/15/2018
by   Shaojie Bai, et al.
0

We present trellis networks, a new architecture for sequence modeling. On the one hand, a trellis network is a temporal convolutional network with special structure, characterized by weight tying across depth and direct injection of the input into deep layers. On the other hand, we show that truncated recurrent networks are equivalent to trellis networks with special sparsity structure in their weight matrices. Thus trellis networks with general weight matrices generalize truncated recurrent networks. We leverage these connections to design high-performing trellis networks that absorb structural and algorithmic elements from both recurrent and convolutional models. Experiments demonstrate that trellis networks outperform the current state of the art on a variety of challenging benchmarks, including word-level language modeling on Penn Treebank and WikiText-103, character-level language modeling on Penn Treebank, and stress tests designed to evaluate long-term memory retention. The code is available at https://github.com/locuslab/trellisnet .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2019

Deep Equilibrium Models

We present a new approach to modeling sequential data: the deep equilibr...
research
12/22/2021

The Importance of the Current Input in Sequence Modeling

The last advances in sequence modeling are mainly based on deep learning...
research
01/14/2020

Block-wise Dynamic Sparseness

Neural networks have achieved state of the art performance across a wide...
research
08/09/2018

Character-Level Language Modeling with Deeper Self-Attention

LSTMs and other RNN variants have shown strong performance on character-...
research
07/25/2017

Dual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks

In this paper, we introduce a novel type of Rectified Linear Unit (ReLU)...
research
03/21/2023

LoRCoN-LO: Long-term Recurrent Convolutional Network-based LiDAR Odometry

We propose a deep learning-based LiDAR odometry estimation method called...
research
03/02/2020

Tensor Networks for Language Modeling

The tensor network formalism has enjoyed over two decades of success in ...

Please sign up or login with your details

Forgot password? Click here to reset