TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection

by   Siddhant Garg, et al.

We propose TANDA, an effective technique for fine-tuning pre-trained Transformer models for natural language tasks. Specifically, we first transfer a pre-trained model into a model for a general task by fine-tuning it with a large and high-quality dataset. We then perform a second fine-tuning step to adapt the transferred model to the target domain. We demonstrate the benefits of our approach for answer sentence selection, which is a well-known inference task in Question Answering. We built a large scale dataset to enable the transfer step, exploiting the Natural Questions dataset. Our approach establishes the state of the art on two well-known benchmarks, WikiQA and TREC-QA, achieving MAP scores of 92 outperform the previous highest scores of 83.4 recent work. We empirically show that TANDA generates more stable and robust models reducing the effort required for selecting optimal hyper-parameters. Additionally, we show that the transfer step of TANDA makes the adaptation step more robust to noise. This enables a more effective use of noisy datasets for fine-tuning. Finally, we also confirm the positive impact of TANDA in an industrial setting, using domain specific datasets subject to different types of noise.


page 3

page 4


Utilizing Bidirectional Encoder Representations from Transformers for Answer Selection

Pre-training a transformer-based model for the language modeling task in...

DP-KB: Data Programming with Knowledge Bases Improves Transformer Fine Tuning for Answer Sentence Selection

While transformers demonstrate impressive performance on many knowledge ...

AutoFT: Automatic Fine-Tune for Parameters Transfer Learning in Click-Through Rate Prediction

Recommender systems are often asked to serve multiple recommendation sce...

Scaling Laws for Transfer

We study empirical scaling laws for transfer learning between distributi...

Data Transfer Approaches to Improve Seq-to-Seq Retrosynthesis

Retrosynthesis is a problem to infer reactant compounds to synthesize a ...

AGenT Zero: Zero-shot Automatic Multiple-Choice Question Generation for Skill Assessments

Multiple-choice questions (MCQs) offer the most promising avenue for ski...

A Study on Efficiency, Accuracy and Document Structure for Answer Sentence Selection

An essential task of most Question Answering (QA) systems is to re-rank ...