Depthwise Separable Convolutions for Neural Machine Translation

06/09/2017
by   Łukasz Kaiser, et al.
0

Depthwise separable convolutions reduce the number of parameters and computation used in convolutional operations while increasing representational efficiency. They have been shown to be successful in image classification models, both in obtaining better models than previously possible for a given parameter count (the Xception architecture) and considerably reducing the number of parameters required to perform at a given level (the MobileNets family of architectures). Recently, convolutional sequence-to-sequence networks have been applied to machine translation tasks with good results. In this work, we study how depthwise separable convolutions can be applied to neural machine translation. We introduce a new architecture inspired by Xception and ByteNet, called SliceNet, which enables a significant reduction of the parameter count and amount of computation needed to obtain results like ByteNet, and, with a similar parameter count, achieves new state-of-the-art results. In addition to showing that depthwise separable convolutions perform well for machine translation, we investigate the architectural changes that they enable: we observe that thanks to depthwise separability, we can increase the length of convolution windows, removing the need for filter dilation. We also introduce a new "super-separable" convolution operation that further reduces the number of parameters and computational cost for obtaining state-of-the-art results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2018

3D Depthwise Convolution: Reducing Model Parameters in 3D Vision Tasks

Standard 3D convolution operations require much larger amounts of memory...
research
09/22/2017

Neural Machine Translation

Draft of textbook chapter on neural machine translation. a comprehensive...
research
08/23/2021

Separable Convolutions for Optimizing 3D Stereo Networks

Deep learning based 3D stereo networks give superior performance compare...
research
04/04/2019

Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

We propose a fully convolutional sequence-to-sequence encoder architectu...
research
05/09/2022

Augmentations: An Insight into their Effectiveness on Convolution Neural Networks

Augmentations are the key factor in determining the performance of any n...
research
10/06/2020

On the Sparsity of Neural Machine Translation Models

Modern neural machine translation (NMT) models employ a large number of ...
research
05/18/2021

Overparametrization of HyperNetworks at Fixed FLOP-Count Enables Fast Neural Image Enhancement

Deep convolutional neural networks can enhance images taken with small m...

Please sign up or login with your details

Forgot password? Click here to reset