Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice

03/12/2022
by   Andreas Grivas, et al.
0

Classifiers in natural language processing (NLP) often have a large number of output classes. For example, neural language models (LMs) and machine translation (MT) models both predict tokens from a vocabulary of thousands. The Softmax output layer of these models typically receives as input a dense feature representation, which has much lower dimensionality than the output. In theory, the result is some words may be impossible to be predicted via argmax, irrespective of input features, and empirically, there is evidence this happens in small language models. In this paper we ask whether it can happen in practical large language models and translation models. To do so, we develop algorithms to detect such unargmaxable tokens in public models. We find that 13 out of 150 models do indeed have such tokens; however, they are very infrequent and unlikely to impact model quality. We release our algorithms and code to the public.

READ FULL TEXT
research
06/11/2018

Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models

Neural language models (NLMs) have recently gained a renewed interest by...
research
06/18/2018

GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

Model compression is essential for serving large deep neural nets on dev...
research
03/28/2023

Comparative Analysis of CHATGPT and the evolution of language models

Interest in Large Language Models (LLMs) has increased drastically since...
research
03/06/2023

Multi-resolution Interpretation and Diagnostics Tool for Natural Language Classifiers

Developing explainability methods for Natural Language Processing (NLP) ...
research
12/02/2022

Nonparametric Masked Language Modeling

Existing language models (LMs) predict tokens with a softmax over a fini...
research
02/11/2020

Superbloom: Bloom filter meets Transformer

We extend the idea of word pieces in natural language models to machine ...
research
05/29/2023

Baselines for Identifying Watermarked Large Language Models

We consider the emerging problem of identifying the presence and use of ...

Please sign up or login with your details

Forgot password? Click here to reset