Log In Sign Up

Pre-Training a Language Model Without Human Language

by   Cheng-Han Chiang, et al.

In this paper, we study how the intrinsic nature of pre-training data contributes to the fine-tuned downstream performance. To this end, we pre-train different transformer-based masked language models on several corpora with certain features, and we fine-tune those language models on GLUE benchmarks. We find that models pre-trained on unstructured data beat those trained directly from scratch on downstream tasks. Our results also show that pre-training on structured data does not always make the model acquire ability that can be transferred to natural language downstream tasks. To our great astonishment, we uncover that pre-training on certain non-human language data gives GLUE performance close to performance pre-trained on another non-English language.


page 1

page 2

page 3

page 4


On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets

Pre-training language models (LMs) on large-scale unlabeled text data ma...

Predictions For Pre-training Language Models

Language model pre-training has proven to be useful in many language und...

Multi-armed bandits for online optimization of language model pre-training: the use case of dynamic masking

Transformer-based language models (TLMs) provide state-of-the-art perfor...

Language Model Pre-Training with Sparse Latent Typing

Modern large-scale Pre-trained Language Models (PLMs) have achieved trem...

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models

Language modeling on large-scale datasets leads to impressive performanc...

FlauBERT: Unsupervised Language Model Pre-training for French

Language models have become a key step to achieve state-of-the-art resul...

On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning

We empirically investigate how pre-training on data of different modalit...