Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection

04/15/2019
by   Masahiro Kaneko, et al.
0

It is known that a deep neural network model pre-trained with large-scale data greatly improves the accuracy of various tasks, especially when there are resource constraints. However, the information needed to solve a given task can vary, and simply using the output of the final layer is not necessarily sufficient. Moreover, to our knowledge, exploiting large language representation models to detect grammatical errors has not yet been studied. In this work, we investigate the effect of utilizing information not only from the final layer but also from intermediate layers of a pre-trained language representation model to detect grammatical errors. We propose a multi-head multi-layer attention model that determines the appropriate layers in Bidirectional Encoder Representation from Transformers (BERT). The proposed method achieved the best scores on three datasets for grammatical error detection tasks, outperforming the current state-of-the-art method by 6.0 points on FCE, 8.2 points on CoNLL14, and 12.2 points on JFLEG in terms of F_0.5. We also demonstrate that by using multi-head multi-layer attention, our model can exploit a broader range of information for each token in a sentence than a model that uses only the final layer's information.

READ FULL TEXT
research
11/05/2019

Deepening Hidden Representations from Pre-trained Language Models for Natural Language Understanding

Transformer-based pre-trained language models have proven to be effectiv...
research
06/17/2021

Multi-head or Single-head? An Empirical Comparison for Transformer Training

Multi-head attention plays a crucial role in the recent success of Trans...
research
07/14/2021

Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding

This paper proposes a serialized multi-layer multi-head attention for ne...
research
05/14/2018

Confidence Scoring Using Whitebox Meta-models with Linear Classifier Probes

We propose a confidence scoring mechanism for multi-layer neural network...
research
06/29/2020

Multi-Head Attention: Collaborate Instead of Concatenate

Attention layers are widely used in natural language processing (NLP) an...
research
03/16/2023

Jump to Conclusions: Short-Cutting Transformers With Linear Transformations

Transformer-based language models (LMs) create hidden representations of...
research
04/21/2020

Observations on Annotations

The annotation of textual information is a fundamental activity in Lingu...

Please sign up or login with your details

Forgot password? Click here to reset