Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems

01/15/2022
by   Yoshitomo Matsubara, et al.
0

Large transformer models can highly improve Answer Sentence Selection (AS2) task, but their high computational costs prevent their use in many real world applications. In this paper, we explore the following research question: How can we make the AS2models more accurate without significantly increasing their model complexity? To address the question, we propose a Multiple Heads Student architecture (MHS), an efficient neural network designed to distill an ensemble of large transformers into a single smaller model. An MHS model consists of two components: a stack of transformer layers that is used to encode inputs, and a set of ranking heads; each of them is trained by distilling a different large transformer architecture. Unlike traditional distillation techniques, our approach leverages individual models in ensemble as teachers in a way that preserves the diversity of the ensemble members. The resulting model captures the knowledge of different types of transformer models by using just a few extra parameters. We show the effectiveness of MHS on three English datasets for AS2; our proposed approach outperforms all single-model distillations we consider, rivaling the state-of-the-art large AS2 models that have 2.7x more parameters and run 2.5x slower.

READ FULL TEXT
research
05/05/2020

The Cascade Transformer: an Application for Efficient Answer Sentence Selection

Large transformer-based language models have been shown to be very effec...
research
08/05/2021

Decoupled Transformer for Scalable Inference in Open-domain Question Answering

Large transformer models, such as BERT, achieve state-of-the-art results...
research
06/01/2020

Context-based Transformer Models for Answer Sentence Selection

An important task for the design of Question Answering systems is the se...
research
07/21/2020

XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language Identification in Social Media Using Transformer Encoders

This paper presents six document classification models using the latest ...
research
08/13/2023

An Ensemble Approach to Question Classification: Integrating Electra Transformer, GloVe, and LSTM

This paper introduces a novel ensemble approach for question classificat...
research
03/17/2022

DP-KB: Data Programming with Knowledge Bases Improves Transformer Fine Tuning for Answer Sentence Selection

While transformers demonstrate impressive performance on many knowledge ...
research
06/30/2020

Correction of Faulty Background Knowledge based on Condition Aware and Revise Transformer for Question Answering

The study of question answering has received increasing attention in rec...

Please sign up or login with your details

Forgot password? Click here to reset