Deja-vu: Double Feature Presentation in Deep Transformer Networks

10/23/2019
by   Andros Tjandra, et al.
0

Deep acoustic models typically receive features in the first layer of the network, and process increasingly abstract representations in the subsequent layers. Here, we propose to feed the input features at multiple depths in the acoustic model. As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function. We study this architecture in the context of deep Transformer networks, and we use an attention mechanism over both the previous layer activations and the input features. To train this model's intermediate output hypothesis, we apply the objective function at each layer right before feature re-use. We find that the use of such intermediate losses significantly improves performance by itself, as well as enabling input feature re-use. We present results on both Librispeech, and a large scale video dataset, with relative improvements of 10 - 20 for videos.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2021

SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification

In this paper, we present SpecAugment++, a novel data augmentation metho...
research
06/18/2020

Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR

The Transformer has shown impressive performance in automatic speech rec...
research
12/12/2022

A Neural ODE Interpretation of Transformer Layers

Transformer layers, which use an alternating pattern of multi-head atten...
research
11/23/2016

Adaptive Feature Abstraction for Translating Video to Text

Previous models for video captioning often use the output from a specifi...
research
06/28/2019

Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts

The transformer is a state-of-the-art neural translation model that uses...
research
07/01/2019

Multilingual, Multi-scale and Multi-layer Visualization of Intermediate Representations

The main alternatives nowadays to deal with sequences are Recurrent Neur...
research
01/17/2023

Tracing and Manipulating Intermediate Values in Neural Math Problem Solvers

How language models process complex input that requires multiple steps o...

Please sign up or login with your details

Forgot password? Click here to reset