DeepAI AI Chat
Log In Sign Up

Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons

by   Jie Hao, et al.
Florida State University

Recent studies have shown that a hybrid of self-attention networks (SANs) and recurrent neural networks (RNNs) outperforms both individual architectures, while not much is known about why the hybrid models work. With the belief that modeling hierarchical structure is an essential complementary between SANs and RNNs, we propose to further enhance the strength of hybrid models with an advanced variant of RNNs - Ordered Neurons LSTM (ON-LSTM), which introduces a syntax-oriented inductive bias to perform tree-like composition. Experimental results on the benchmark machine translation task show that the proposed approach outperforms both individual architectures and a standard hybrid model. Further analyses on targeted linguistic evaluation and logical inference tasks demonstrate that the proposed approach indeed benefits from a better modeling of hierarchical structure.


page 1

page 2

page 3

page 4


Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Recurrent neural network (RNN) models are widely used for processing seq...

Can Neural Networks Understand Logical Entailment?

We introduce a new dataset of logical entailments for the purpose of mea...

Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

Recently, non-recurrent architectures (convolutional, self-attentional) ...

Automatic Business Process Structure Discovery using Ordered Neurons LSTM: A Preliminary Study

Automatic process discovery from textual process documentations is highl...

Modeling Localness for Self-Attention Networks

Self-attention networks have proven to be of profound value for its stre...

QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

Current end-to-end machine reading and question answering (Q&A) models a...

FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding

Inducing latent tree structures from sequential data is an emerging tren...