Improving Question Answering Performance Using Knowledge Distillation and Active Learning

09/26/2021
by   Yasaman Boreshban, et al.
0

Contemporary question answering (QA) systems, including transformer-based architectures, suffer from increasing computational and model complexity which render them inefficient for real-world applications with limited resources. Further, training or even fine-tuning such models requires a vast amount of labeled data which is often not available for the task at hand. In this manuscript, we conduct a comprehensive analysis of the mentioned challenges and introduce suitable countermeasures. We propose a novel knowledge distillation (KD) approach to reduce the parameter and model complexity of a pre-trained BERT system and utilize multiple active learning (AL) strategies for immense reduction in annotation efforts. In particular, we demonstrate that our model achieves the performance of a 6-layer TinyBERT and DistilBERT, whilst using only 2 approaches into the BERT framework, we show that state-of-the-art results on the SQuAD dataset can be achieved when we only use 20

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2022

BERTVision – A Parameter-Efficient Approach for Question Answering

We present a highly parameter efficient approach for Question Answering ...
research
10/18/2019

Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System

Deep pre-training and fine-tuning models (such as BERT and OpenAI GPT) h...
research
08/05/2021

Decoupled Transformer for Scalable Inference in Open-domain Question Answering

Large transformer models, such as BERT, achieve state-of-the-art results...
research
09/04/2023

On the Query Strategies for Efficient Online Active Distillation

Deep Learning (DL) requires lots of time and data, resulting in high com...
research
10/06/2022

To Softmax, or not to Softmax: that is the question when applying Active Learning for Transformer Models

Despite achieving state-of-the-art results in nearly all Natural Languag...
research
04/21/2019

Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System

Deep pre-training and fine-tuning models (like BERT, OpenAI GPT) have de...
research
10/07/2020

DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

Pre-trained models like BERT (Devlin et al., 2018) have dominated NLP / ...

Please sign up or login with your details

Forgot password? Click here to reset