On the Universality of Deep COntextual Language Models

09/15/2021
by   Shaily Bhatt, et al.
0

Deep Contextual Language Models (LMs) like ELMO, BERT, and their successors dominate the landscape of Natural Language Processing due to their ability to scale across multiple tasks rapidly by pre-training a single model, followed by task-specific fine-tuning. Furthermore, multilingual versions of such models like XLM-R and mBERT have given promising results in zero-shot cross-lingual transfer, potentially enabling NLP applications in many under-served and under-resourced languages. Due to this initial success, pre-trained models are being used as `Universal Language Models' as the starting point across diverse tasks, domains, and languages. This work explores the notion of `Universality' by identifying seven dimensions across which a universal model should be able to scale, that is, perform equally well or reasonably well, to be useful across diverse settings. We outline the current theoretical and empirical results that support model performance across these dimensions, along with extensions that may help address some of their current limitations. Through this survey, we lay the foundation for understanding the capabilities and limitations of massive contextual language models and help discern research gaps and directions for future work to make these LMs inclusive and fair to diverse applications, users, and linguistic phenomena.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2021

First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

Multilingual pretrained language models have demonstrated remarkable zer...
research
10/23/2020

DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries

Pre-trained multilingual language models such as mBERT have shown immens...
research
06/13/2023

Soft Language Clustering for Multilingual Model Pre-training

Multilingual pre-trained language models have demonstrated impressive (z...
research
09/02/2021

Establishing Interlingua in Multilingual Language Models

Large multilingual language models show remarkable zero-shot cross-lingu...
research
09/16/2021

Language Models are Few-shot Multilingual Learners

General-purpose language models have demonstrated impressive capabilitie...
research
06/29/2023

Benchmarking Large Language Model Capabilities for Conditional Generation

Pre-trained large language models (PLMs) underlie most new developments ...
research
09/16/2023

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

Despite the power of Large Language Models (LLMs) like GPT-4, they still...

Please sign up or login with your details

Forgot password? Click here to reset