research
∙
06/17/2022
Debugging using Orthogonal Gradient Descent
In this report we consider the following problem: Given a trained model ...
research
∙
10/05/2021
Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers
Recent studies have demonstrated that the performance of transformers on...
research
∙
02/22/2021