DeepAI
Log In Sign Up

Pre-Training a Language Model Without Human Language

12/22/2020
by   Cheng-Han Chiang, et al.
0

In this paper, we study how the intrinsic nature of pre-training data contributes to the fine-tuned downstream performance. To this end, we pre-train different transformer-based masked language models on several corpora with certain features, and we fine-tune those language models on GLUE benchmarks. We find that models pre-trained on unstructured data beat those trained directly from scratch on downstream tasks. Our results also show that pre-training on structured data does not always make the model acquire ability that can be transferred to natural language downstream tasks. To our great astonishment, we uncover that pre-training on certain non-human language data gives GLUE performance close to performance pre-trained on another non-English language.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/08/2021

On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets

Pre-training language models (LMs) on large-scale unlabeled text data ma...
11/18/2020

Predictions For Pre-training Language Models

Language model pre-training has proven to be useful in many language und...
03/24/2022

Multi-armed bandits for online optimization of language model pre-training: the use case of dynamic masking

Transformer-based language models (TLMs) provide state-of-the-art perfor...
10/23/2022

Language Model Pre-Training with Sparse Latent Typing

Modern large-scale Pre-trained Language Models (PLMs) have achieved trem...
10/25/2022

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models

Language modeling on large-scale datasets leads to impressive performanc...
12/11/2019

FlauBERT: Unsupervised Language Model Pre-training for French

Language models have become a key step to achieve state-of-the-art resul...
11/17/2022

On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning

We empirically investigate how pre-training on data of different modalit...