Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons

09/04/2019
by   Jie Hao, et al.
0

Recent studies have shown that a hybrid of self-attention networks (SANs) and recurrent neural networks (RNNs) outperforms both individual architectures, while not much is known about why the hybrid models work. With the belief that modeling hierarchical structure is an essential complementary between SANs and RNNs, we propose to further enhance the strength of hybrid models with an advanced variant of RNNs - Ordered Neurons LSTM (ON-LSTM), which introduces a syntax-oriented inductive bias to perform tree-like composition. Experimental results on the benchmark machine translation task show that the proposed approach outperforms both individual architectures and a standard hybrid model. Further analyses on targeted linguistic evaluation and logical inference tasks demonstrate that the proposed approach indeed benefits from a better modeling of hierarchical structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2018

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Recurrent neural network (RNN) models are widely used for processing seq...
research
02/23/2018

Can Neural Networks Understand Logical Entailment?

We introduce a new dataset of logical entailments for the purpose of mea...
research
08/27/2018

Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

Recently, non-recurrent architectures (convolutional, self-attentional) ...
research
10/24/2018

Modeling Localness for Self-Attention Networks

Self-attention networks have proven to be of profound value for its stre...
research
01/05/2020

Automatic Business Process Structure Discovery using Ordered Neurons LSTM: A Preliminary Study

Automatic process discovery from textual process documentations is highl...
research
04/23/2018

QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

Current end-to-end machine reading and question answering (Q&A) models a...
research
11/28/2021

FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding

Inducing latent tree structures from sequential data is an emerging tren...

Please sign up or login with your details

Forgot password? Click here to reset