Formal Algorithms for Transformers

07/19/2022
by   Mary Phuong, et al.
18

This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures such as MLPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2023

An Introduction to Transformers

The transformer is a neural network component that can be used to learn ...
research
02/02/2023

Mnemosyne: Learning to Train Transformers with Transformers

Training complex machine learning (ML) architectures requires a compute ...
research
09/16/2022

Quantum Vision Transformers

We design and analyse quantum transformers, extending the state-of-the-a...
research
12/07/2021

Relating transformers to models and neural representations of the hippocampal formation

Many deep neural network architectures loosely based on brain networks h...
research
10/25/2022

Audio MFCC-gram Transformers for respiratory insufficiency detection in COVID-19

This work explores speech as a biomarker and investigates the detection ...
research
09/19/2023

Interpret Vision Transformers as ConvNets with Dynamic Convolutions

There has been a debate about the superiority between vision Transformer...
research
05/24/2023

Can Transformers Learn to Solve Problems Recursively?

Neural networks have in recent years shown promise for helping software ...

Please sign up or login with your details

Forgot password? Click here to reset