DeepAI AI Chat
Log In Sign Up

Hierarchical Phrase-based Sequence-to-Sequence Learning

by   Bailin Wang, et al.

We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference. Our approach trains two models: a discriminative parser based on a bracketing transduction grammar whose derivation tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one. We use the same seq2seq model to translate at all phrase scales, which results in two inference modes: one mode in which the parser is discarded and only the seq2seq component is used at the sequence-level, and another in which the parser is combined with the seq2seq model. Decoding in the latter mode is done with the cube-pruned CKY algorithm, which is more involved but can make use of new translation rules during inference. We formalize our model as a source-conditioned synchronous grammar and develop an efficient variational inference algorithm for training. When applied on top of both randomly initialized and pretrained seq2seq models, we find that both inference modes performs well compared to baselines on small scale machine translation benchmarks.


page 1

page 2

page 3

page 4


Sequence-to-Sequence Learning with Latent Neural Grammars

Sequence-to-sequence learning with neural networks has become the de fac...

Incorporating Syntactic Uncertainty in Neural Machine Translation with Forest-to-Sequence Model

Incorporating syntactic information in Neural Machine Translation models...

Neural Phrase-to-Phrase Machine Translation

In this paper, we propose Neural Phrase-to-Phrase Machine Translation (N...

Identifying Hierarchical Structure in Sequences: A linear-time algorithm

SEQUITUR is an algorithm that infers a hierarchical structure from a seq...

Progressive Multi-Granularity Training for Non-Autoregressive Translation

Non-autoregressive translation (NAT) significantly accelerates the infer...

Approximate Distribution Matching for Sequence-to-Sequence Learning

Sequence-to-Sequence models were introduced to tackle many real-life pro...

Code Repositories