Evaluating Syntactic Properties of Seq2seq Output with a Broad Coverage HPSG: A Case Study on Machine Translation

09/06/2018
by   Johnny Tian-Zheng Wei, et al.
0

Sequence to sequence (seq2seq) models are often employed in settings where the target output is natural language. However, the syntactic properties of the language generated from these models are not well understood. We explore whether such output belongs to a formal and realistic grammar, by employing the English Resource Grammar (ERG), a broad coverage, linguistically precise HPSG-based grammar of English. From a French to English parallel corpus, we analyze the parseability and grammatical constructions occurring in output from a seq2seq translation model. Over 93% of the model translations are parseable, suggesting that it learns to generate conforming to a grammar. The model has trouble learning the distribution of rarer syntactic rules, and we pinpoint several constructions that differentiate translations between the references and our model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2023

Syntactic Variation Across the Grammar: Modelling a Complex Adaptive System

While language is a complex adaptive system, most work on syntactic vari...
research
05/27/2023

CGELBank Annotation Manual v1.0

CGELBank is a treebank and associated tools based on a syntactic formali...
research
05/22/2023

The Grammar and Syntax Based Corpus Analysis Tool For The Ukrainian Language

This paper provides an overview of a text mining tool the StyloMetrix de...
research
04/22/2020

DeepSubQE: Quality estimation for subtitle translations

Quality estimation (QE) for tasks involving language data is hard owing ...
research
08/05/2022

Phrase translation using a bilingual dictionary and n-gram data: A case study from Vietnamese to English

Past approaches to translate a phrase in a language L1 to a language L2 ...
research
06/05/2023

Improving Grammar-based Sequence-to-Sequence Modeling with Decomposition and Constraints

Neural QCFG is a grammar-based sequence-tosequence (seq2seq) model with ...
research
09/11/2022

Stability of Syntactic Dialect Classification Over Space and Time

This paper analyses the degree to which dialect classifiers based on syn...

Please sign up or login with your details

Forgot password? Click here to reset