One Model To Learn Them All

06/16/2017
by   Łukasz Kaiser, et al.
0

Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2015

Attention-Based Models for Speech Recognition

Recurrent sequence generators conditioned on input data through an atten...
research
07/04/2015

Describing Multimedia Content using Attention-based Encoder--Decoder Networks

Whereas deep neural networks were first mostly used for classification t...
research
08/27/2018

A neural attention model for speech command recognition

This paper introduces a convolutional recurrent network with attention f...
research
12/17/2018

Persian phonemes recognition using PPNet

In this paper a new approach for recognition of Persian phonemes on the ...
research
03/05/2023

Comparative study of Transformer and LSTM Network with attention mechanism on Image Captioning

In a globalized world at the present epoch of generative intelligence, m...
research
05/21/2019

Improving Minimal Gated Unit for Sequential Data

In order to obtain a model which can process sequential data related to ...

Please sign up or login with your details

Forgot password? Click here to reset