Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

09/03/2021
by   Paul Michel, et al.
0

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2017

Natural Language Processing: State of The Art, Current Trends and Challenges

Natural language processing (NLP) has recently gained much attention for...
research
07/23/2021

Estimating Predictive Uncertainty Under Program Data Distribution Shift

Deep learning (DL) techniques have achieved great success in predictive ...
research
03/20/2023

Self-Improving-Leaderboard(SIL): A Call for Real-World Centric Natural Language Processing Leaderboards

Leaderboard systems allow researchers to objectively evaluate Natural La...
research
10/06/2020

Efficient Meta Lifelong-Learning with Limited Memory

Current natural language processing models work well on a single task, y...
research
12/17/2020

Continual Lifelong Learning in Natural Language Processing: A Survey

Continual learning (CL) aims to enable information systems to learn from...
research
08/17/2023

Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit

The primary focus of this thesis is to make Sanskrit manuscripts more ac...

Please sign up or login with your details

Forgot password? Click here to reset