Transformers have impressive generalization capabilities on tasks with a...
Unsupervised learning of discrete representations from continuous ones i...
Well-designed diagnostic tasks have played a key role in studying the fa...
The cornerstone of neural algorithmic reasoning is the ability to solve
...
Linear layers in neural networks (NNs) trained by gradient descent can b...
The weight matrix (WM) of a neural network (NN) is its program. The prog...
We share our experience with the recently released WILDS benchmark, a
co...
Despite successes across a broad range of applications, Transformers hav...
Recently, many datasets have been proposed to test the systematic
genera...
Transformers with linearised attention ("linear Transformers") have
demo...
Neural networks (NNs) whose subnetworks implement reusable functions are...
The Differentiable Neural Computer (DNC) can learn algorithmic and quest...