Language Models with Transformers

04/20/2019
by   Chenguang Wang, et al.
16

The Transformer architecture is superior to RNN-based models in computational efficiency. Recently, GPT and BERT demonstrate the efficacy of Transformer models on various NLP tasks using pre-trained language models on large-scale corpora. Surprisingly, these Transformer architectures are suboptimal for language model itself. Neither self-attention nor the positional encoding in the Transformer is able to efficiently incorporate the word-level sequential context crucial to language modeling. In this paper, we explore effective Transformer architectures for language model, including adding additional LSTM layers to better capture the sequential context while still keeping the computation efficient. We propose Coordinate Architecture Search (CAS) to find an effective architecture through iterative refinement of the model. Experimental results on the PTB, WikiText-2, and WikiText-103 show that CAS achieves perplexities between 20.42 and 34.11 on all problems, i.e. on average an improvement of 12.0 perplexity units compared to state-of-the-art LSTMs.

READ FULL TEXT
research
06/22/2021

A Comprehensive Exploration of Pre-training Language Models

Recently, the development of pre-trained language models has brought nat...
research
03/14/2020

Finnish Language Modeling with Deep Transformer Models

Transformers have recently taken the center stage in language modeling a...
research
02/16/2021

Exploring Transformers in Natural Language Generation: GPT, BERT, and XLNet

Recent years have seen a proliferation of attention mechanisms and the r...
research
02/14/2020

Transformer on a Diet

Transformer has been widely used thanks to its ability to capture sequen...
research
07/24/2022

A Cognitive Study on Semantic Similarity Analysis of Large Corpora: A Transformer-based Approach

Semantic similarity analysis and modeling is a fundamentally acclaimed t...
research
01/21/2023

REDAffectiveLM: Leveraging Affect Enriched Embedding and Transformer-based Neural Language Model for Readers' Emotion Detection

Technological advancements in web platforms allow people to express and ...
research
09/20/2022

Relaxed Attention for Transformer Models

The powerful modeling capabilities of all-attention-based transformer ar...

Please sign up or login with your details

Forgot password? Click here to reset