BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

09/20/2023
by   Nolan Dey, et al.
0

We introduce the Bittensor Language Model, called "BTLM-3B-8K", a new state-of-the-art 3 billion parameter open-source language model. BTLM-3B-8K was trained on 627B tokens from the SlimPajama dataset with a mixture of 2,048 and 8,192 context lengths. BTLM-3B-8K outperforms all existing 3B parameter models by 2-5.5 parameter models. Additionally, BTLM-3B-8K provides excellent long context performance, outperforming MPT-7B-8K and XGen-7B-8K on tasks up to 8,192 context length. We trained the model on a cleaned and deduplicated SlimPajama dataset; aggressively tuned the µP hyperparameters and schedule; used ALiBi position embeddings; and adopted the SwiGLU nonlinearity. On Hugging Face, the most popular models have 7B parameters, indicating that users prefer the quality-size ratio of 7B models. Compacting the 7B parameter model to one with 3B parameters, with little performance impact, is an important milestone. BTLM-3B-8K needs only 3GB of memory with 4-bit precision and takes 2.5x less inference compute than 7B models, helping to open up access to a powerful language model on mobile and edge devices. BTLM-3B-8K is available under an Apache 2.0 license on Hugging Face: https://huggingface.co/cerebras/btlm-3b-8k-base.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2022

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive languag...
research
09/30/2021

SlovakBERT: Slovak Masked Language Model

We introduce a new Slovak masked language model called SlovakBERT in thi...
research
11/15/2022

RobBERT-2022: Updating a Dutch Language Model to Account for Evolving Language Use

Large transformer-based language models, e.g. BERT and GPT-3, outperform...
research
08/21/2023

Giraffe: Adventures in Expanding Context Lengths in LLMs

Modern large language models (LLMs) that rely on attention mechanisms ar...
research
02/28/2020

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

We propose to pre-train a unified language model for both autoencoding a...
research
10/05/2022

GLM-130B: An Open Bilingual Pre-trained Model

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained lan...
research
03/30/2023

BloombergGPT: A Large Language Model for Finance

The use of NLP in the realm of financial technology is broad and complex...

Please sign up or login with your details

Forgot password? Click here to reset