Algorithm and Hardness for Dynamic Attention Maintenance in Large Language Models

04/05/2023
by   Jan van den Brand, et al.
0

Large language models (LLMs) have made fundamental changes in human life. The attention scheme is one of the key components over all the LLMs, such as BERT, GPT-1, Transformers, GPT-2, 3, 3.5 and 4. Inspired by previous theoretical study of static version of the attention multiplication problem [Zandieh, Han, Daliri, and Karbasi arXiv 2023, Alman and Song arXiv 2023]. In this work, we formally define a dynamic version of attention matrix multiplication problem. There are matrices Q,K, V ∈ℝ^n × d, they represent query, key and value in LLMs. In each iteration we update one entry in K or V. In the query stage, we receive (i,j) ∈ [n] × [d] as input, and want to answer (D^-1 A V)_i,j, where A:=exp(QK^⊤) ∈ℝ^n × n is a square matrix and D := diag(A 1_n) ∈ℝ^n × n is a diagonal matrix. Here 1_n denote a length-n vector that all the entries are ones. We provide two results: an algorithm and a conditional lower bound. ∙ On one hand, inspired by the lazy update idea from [Demetrescu and Italiano FOCS 2000, Sankowski FOCS 2004, Cohen, Lee and Song STOC 2019, Brand SODA 2020], we provide a data-structure that uses O(n^ω(1,1,τ)-τ) amortized update time, and O(n^1+τ) worst-case query time. ∙ On the other hand, show that unless the hinted matrix vector multiplication conjecture [Brand, Nanongkai and Saranurak FOCS 2019] is false, there is no algorithm that can use both O(n^ω(1,1,τ) - τ- Ω(1)) amortized update time, and O(n^1+τ-Ω(1)) worst query time. In conclusion, our algorithmic result is conditionally optimal unless hinted matrix vector multiplication conjecture is false.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2018

Dynamic Effective Resistances and Approximate Schur Complement on Separable Graphs

We consider the problem of dynamically maintaining (approximate) all-pai...
research
02/06/2019

New Amortized Cell-Probe Lower Bounds for Dynamic Problems

We build upon the recent papers by Weinstein and Yu (FOCS'16), Larsen (F...
research
04/10/2023

Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension

Large language models (LLMs) have shown their power in different areas. ...
research
10/25/2020

On Updating and Querying Submatrices

In this paper, we study the d-dimensional update-query problem. We provi...
research
04/09/2018

Counting Triangles under Updates in Worst-Case Optimal Time

We consider the problem of incrementally maintaining the triangle count ...
research
04/07/2020

Maintaining Triangle Queries under Updates

We consider the problem of incrementally maintaining the triangle querie...

Please sign up or login with your details

Forgot password? Click here to reset