FPM: A Collection of Large-scale Foundation Pre-trained Language Models

11/09/2021
by   Dezhou Shen, et al.
0

Recent work in language modeling has shown that training large-scale Transformer models has promoted the latest developments in natural language processing applications. However, there is very little work to unify the current effective models. In this work, we use the current effective model structure to launch a model set through the current most mainstream technology. We think this will become the basic model in the future. For Chinese, using the GPT-2[9] model, a 10.3 billion parameter language model was trained on the Chinese dataset, and, in particular, a 2.9 billion parameter language model based on dialogue data was trained; the BERT model was trained on the Chinese dataset with 495 million parameters; the Transformer model has trained a language model with 5.6 billion parameters on the Chinese dataset. In English, corresponding training work has also been done. Using the GPT-2 model, a language model with 6.4 billion parameters was trained on the English dataset; the BERT[3] model trained a language model with 1.24 billion parameters on the English dataset, and in particular, it trained a 688 million parameter based on single card training technology Language model; Transformer model trained a language model with 5.6 billion parameters on the English dataset. In the TNEWS classification task evaluated by CLUE[13], the BERT-C model exceeded the 59.46 accuracy of ALBERT-xxlarge with an accuracy rate of 59.99 0.53 of 78.95 6.85 GLUE evaluation of 75.2

READ FULL TEXT
research
10/23/2020

HateBERT: Retraining BERT for Abusive Language Detection in English

In this paper, we introduce HateBERT, a re-trained BERT model for abusiv...
research
11/15/2022

RobBERT-2022: Updating a Dutch Language Model to Account for Evolving Language Use

Large transformer-based language models, e.g. BERT and GPT-3, outperform...
research
05/09/2023

Investigating the effect of sub-word segmentation on the performance of transformer language models

We would like to explore how morphemes can affect the performance of a l...
research
01/13/2023

In BLOOM: Creativity and Affinity in Artificial Lyrics and Art

We apply a large multilingual language model (BLOOM-176B) in open-ended ...
research
08/16/2023

FootGPT : A Large Language Model Development Experiment on a Minimal Setting

With recent empirical observations, it has been argued that the most sig...
research
09/17/2023

A Few-Shot Approach to Dysarthric Speech Intelligibility Level Classification Using Transformers

Dysarthria is a speech disorder that hinders communication due to diffic...
research
09/17/2019

Megatron-LM: Training Multi-Billion Parameter Language Models Using GPU Model Parallelism

Recent work in unsupervised language modeling demonstrates that training...

Please sign up or login with your details

Forgot password? Click here to reset