Seeing Both the Forest and the Trees: Multi-head Attention for Joint Classification on Different Compositional Levels

11/01/2020
by   Miruna Pislar, et al.
0

In natural languages, words are used in association to construct sentences. It is not words in isolation, but the appropriate combination of hierarchical structures that conveys the meaning of the whole sentence. Neural networks can capture expressive language features; however, insights into the link between words and sentences are difficult to acquire automatically. In this work, we design a deep neural network architecture that explicitly wires lower and higher linguistic components; we then evaluate its ability to perform the same task at different hierarchical levels. Settling on broad text classification tasks, we show that our model, MHAL, learns to simultaneously solve them at different levels of granularity by fluidly transferring knowledge between hierarchies. Using a multi-head attention mechanism to tie the representations between single words and full sentences, MHAL systematically outperforms equivalent models that are not incentivized towards developing compositional representations. Moreover, we demonstrate that, with the proposed architecture, the sentence information flows naturally to individual words, allowing the model to behave like a sequence labeller (which is a lower, word-level task) even without any word supervision, in a zero-shot fashion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2018

Jointly Learning to Label Sentences and Tokens

Learning to construct text representations in end-to-end systems can be ...
research
07/05/2017

A Deep Network with Visual Text Composition Behavior

While natural languages are compositional, how state-of-the-art neural m...
research
03/13/2021

Approximating How Single Head Attention Learns

Why do models often attend to salient words, and how does this evolve th...
research
07/14/2021

Composing Conversational Negation

Negation in natural language does not follow Boolean logic and is theref...
research
09/28/2016

Hierarchical Memory Networks for Answer Selection on Unknown Words

Recently, end-to-end memory networks have shown promising results on Que...
research
06/13/2018

Generating Sentences Using a Dynamic Canvas

We introduce the Attentive Unsupervised Text (W)riter (AUTR), which is a...
research
03/03/2021

An Iterative Contextualization Algorithm with Second-Order Attention

Combining the representations of the words that make up a sentence into ...

Please sign up or login with your details

Forgot password? Click here to reset