BERTQA – Attention on Steroids

12/14/2019
by   Ankit Chadha, et al.
0

In this work, we extend the Bidirectional Encoder Representations from Transformers (BERT) with an emphasis on directed coattention to obtain an improved F1 performance on the SQUAD2.0 dataset. The Transformer architecture on which BERT is based places hierarchical global attention on the concatenation of the context and query. Our additions to the BERT architecture augment this attention with a more focused context to query (C2Q) and query to context (Q2C) attention via a set of modified Transformer encoder units. In addition, we explore adding convolution-based feature extraction within the coattention architecture to add localized information to self-attention. We found that coattention significantly improves the no answer F1 by 4 points in the base and 1 point in the large architecture. After adding skip connections the no answer F1 improved further without causing an additional loss in has answer F1. The addition of localized feature extraction added to attention produced an overall dev F1 of 77.03 in the base architecture. We applied our findings to the large BERT model which contains twice as many layers and further used our own augmented version of the SQUAD 2.0 dataset created by back translation, which we have named SQUAD 2.Q. Finally, we performed hyperparameter tuning and ensembled our best models for a final F1/EM of 82.317/79.442 (Attention on Steroids, PCE Test Leaderboard).

READ FULL TEXT
research
10/10/2020

Information Extraction from Swedish Medical Prescriptions with Sig-Transformer Encoder

Relying on large pretrained language models such as Bidirectional Encode...
research
09/09/2019

Span Selection Pre-training for Question Answering

BERT (Bidirectional Encoder Representations from Transformers) and relat...
research
10/19/2021

Ensemble ALBERT on SQuAD 2.0

Machine question answering is an essential yet challenging task in natur...
research
05/12/2021

Building a Question and Answer System for News Domain

This project attempts to build a Question- Answering system in the News ...
research
02/07/2022

Universal Spam Detection using Transfer Learning of BERT Model

Deep learning transformer models become important by training on text da...
research
01/12/2021

Of Non-Linearity and Commutativity in BERT

In this work we provide new insights into the transformer architecture, ...
research
10/17/2020

Learning from similarity and information extraction from structured documents

Neural networks have successfully advanced in the task of information ex...

Please sign up or login with your details

Forgot password? Click here to reset