Attention Scheme Inspired Softmax Regression

04/20/2023
by   Yichuan Deng, et al.
0

Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible next words or phrases, given a sequence of input words. This distribution is then used to select the most likely next word or phrase, based on the probabilities assigned by the model. The softmax unit plays a crucial role in training LLMs, as it allows the model to learn from the data by adjusting the weights and biases of the neural network. In the area of convex optimization such as using central path method to solve linear programming. The softmax function has been used a crucial tool for controlling the progress and stability of potential function [Cohen, Lee and Song STOC 2019, Brand SODA 2020]. In this work, inspired the softmax unit, we define a softmax regression problem. Formally speaking, given a matrix A ∈ℝ^n × d and a vector b ∈ℝ^n, the goal is to use greedy type algorithm to solve min_x⟨exp(Ax), 1_n ⟩^-1exp(Ax) - b _2^2. In certain sense, our provable convergence result provides theoretical support for why we can use greedy algorithm to train softmax function in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2023

Zero-th Order Algorithm for Softmax Attention Optimization

Large language models (LLMs) have brought about significant transformati...
research
08/16/2023

Convergence of Two-Layer Regression with Nonlinear Units

Large language models (LLMs), such as ChatGPT and GPT4, have shown outst...
research
08/23/2023

How to Protect Copyright Data in Optimization of Large Language Models?

Large language models (LLMs) and generative AI have played a transformat...
research
05/01/2023

An Iterative Algorithm for Rescaled Hyperbolic Functions Regression

Large language models (LLMs) have numerous real-life applications across...
research
07/05/2023

In-Context Learning for Attention Scheme: from Single Softmax Regression to Multiple Softmax Regression via a Tensor Trick

Large language models (LLMs) have brought significant and transformative...
research
09/14/2016

Efficient softmax approximation for GPUs

We propose an approximate strategy to efficiently train neural network b...
research
03/26/2016

Pointing the Unknown Words

The problem of rare and unknown words is an important issue that can pot...

Please sign up or login with your details

Forgot password? Click here to reset