Convergence of Two-Layer Regression with Nonlinear Units

08/16/2023
by   Yichuan Deng, et al.
0

Large language models (LLMs), such as ChatGPT and GPT4, have shown outstanding performance in many human life task. Attention computation plays an important role in training LLMs. Softmax unit and ReLU unit are the key structure in attention computation. Inspired by them, we put forward a softmax ReLU regression problem. Generally speaking, our goal is to find an optimal solution to the regression problem involving the ReLU unit. In this work, we calculate a close form representation for the Hessian of the loss function. Under certain assumptions, we prove the Lipschitz continuous and the PSDness of the Hessian. Then, we introduce an greedy algorithm based on approximate Newton method, which converges in the sense of the distance to optimal solution. Last, We relax the Lipschitz condition and prove the convergence in the sense of loss value.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2023

Attention Scheme Inspired Softmax Regression

Large language models (LLMs) have made transformed changes for human soc...
research
03/28/2023

Solving Regularized Exp, Cosh and Sinh Regression Problems

In modern machine learning, attention computation is a fundamental task ...
research
05/01/2023

An Iterative Algorithm for Rescaled Hyperbolic Functions Regression

Large language models (LLMs) have numerous real-life applications across...
research
11/30/2022

Newton Method with Variable Selection by the Proximal Gradient Method

In sparse estimation, in which the sum of the loss function and the regu...
research
06/15/2020

Globally Injective ReLU Networks

We study injective ReLU neural networks. Injectivity plays an important ...
research
07/05/2023

In-Context Learning for Attention Scheme: from Single Softmax Regression to Multiple Softmax Regression via a Tensor Trick

Large language models (LLMs) have brought significant and transformative...
research
02/01/2019

Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation

Current methods to interpret deep learning models by generating saliency...

Please sign up or login with your details

Forgot password? Click here to reset