TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection

11/11/2019
by   Siddhant Garg, et al.
0

We propose TANDA, an effective technique for fine-tuning pre-trained Transformer models for natural language tasks. Specifically, we first transfer a pre-trained model into a model for a general task by fine-tuning it with a large and high-quality dataset. We then perform a second fine-tuning step to adapt the transferred model to the target domain. We demonstrate the benefits of our approach for answer sentence selection, which is a well-known inference task in Question Answering. We built a large scale dataset to enable the transfer step, exploiting the Natural Questions dataset. Our approach establishes the state of the art on two well-known benchmarks, WikiQA and TREC-QA, achieving MAP scores of 92 outperform the previous highest scores of 83.4 recent work. We empirically show that TANDA generates more stable and robust models reducing the effort required for selecting optimal hyper-parameters. Additionally, we show that the transfer step of TANDA makes the adaptation step more robust to noise. This enables a more effective use of noisy datasets for fine-tuning. Finally, we also confirm the positive impact of TANDA in an industrial setting, using domain specific datasets subject to different types of noise.

READ FULL TEXT

page 3

page 4

research
11/14/2020

Utilizing Bidirectional Encoder Representations from Transformers for Answer Selection

Pre-training a transformer-based model for the language modeling task in...
research
03/17/2022

DP-KB: Data Programming with Knowledge Bases Improves Transformer Fine Tuning for Answer Sentence Selection

While transformers demonstrate impressive performance on many knowledge ...
research
05/24/2023

Context-Aware Transformer Pre-Training for Answer Sentence Selection

Answer Sentence Selection (AS2) is a core component for building an accu...
research
02/07/2020

Improving the Adversarial Robustness of Transfer Learning via Noisy Feature Distillation

Fine-tuning through knowledge transfer from a pre-trained model on a lar...
research
11/02/2022

Low-Resource Music Genre Classification with Advanced Neural Model Reprogramming

Transfer learning (TL) approaches have shown promising results when hand...
research
10/17/2022

ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization

Recent advances on large-scale pre-training have shown great potentials ...
research
10/02/2020

Data Transfer Approaches to Improve Seq-to-Seq Retrosynthesis

Retrosynthesis is a problem to infer reactant compounds to synthesize a ...

Please sign up or login with your details

Forgot password? Click here to reset