Encodings of Source Syntax: Similarities in NMT Representations Across Target Languages

05/17/2020
by   Tyler A. Chang, et al.
0

We train neural machine translation (NMT) models from English to six target languages, using NMT encoder representations to predict ancestor constituent labels of source language words. We find that NMT encoders learn similar source syntax regardless of NMT target language, relying on explicit morphosyntactic cues to extract syntactic features from source sentences. Furthermore, the NMT encoders outperform RNNs trained directly on several of the constituent label prediction tasks, suggesting that NMT encoder representations can be used effectively for natural language tasks involving syntax. However, both the NMT encoders and the directly-trained RNNs learn substantially different syntactic information from a probabilistic context-free grammar (PCFG) parser. Despite lower overall accuracy scores, the PCFG often performs well on sentences for which the RNN-based models perform poorly, suggesting that RNN architectures are constrained in the types of syntax they can learn.

READ FULL TEXT

page 6

page 7

research
05/02/2017

Modeling Source Syntax for Neural Machine Translation

Even though a linguistics-free sequence to sequence model in neural mach...
research
02/03/2017

Predicting Target Language CCG Supertags Improves Neural Machine Translation

Neural machine translation (NMT) models are able to partially learn synt...
research
01/23/2018

Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks

While neural machine translation (NMT) models provide improved translati...
research
05/08/2019

Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations

Syntax has been demonstrated highly effective in neural machine translat...
research
12/31/2021

How do lexical semantics affect translation? An empirical study

Neural machine translation (NMT) systems aim to map text from one langua...
research
07/13/2021

On the Difficulty of Translating Free-Order Case-Marking Languages

Identifying factors that make certain languages harder to model than oth...
research
05/11/2018

Deep RNNs Encode Soft Hierarchical Syntax

We present a set of experiments to demonstrate that deep recurrent neura...

Please sign up or login with your details

Forgot password? Click here to reset