b'James Martens'

research

∙ 02/20/2023

Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

Skip connections and normalisation layers form two standard architectura...

0 Bobby He, et al. ∙

research

∙ 05/31/2022

Pre-training via Denoising for Molecular Property Prediction

Many important problems involving molecular property prediction from 3D ...

0 Sheheryar Zaidi, et al. ∙

research

∙ 03/15/2022

Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers

Training very deep neural networks is still an extremely challenging tas...

0 Guodong Zhang, et al. ∙

research

∙ 04/13/2021

On the validity of kernel approximations for orthogonally-initialized neural networks

In this note we extend kernel function approximation results for neural ...

0 James Martens, et al. ∙

research

∙ 07/09/2019

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model

Increasing the batch size is a popular way to speed up neural network tr...

3 Guodong Zhang, et al. ∙

research

∙ 07/04/2019

Adversarial Robustness through Local Linearization

Adversarial training is an effective methodology for training deep neura...

3 Chongli Qin, et al. ∙

research

∙ 05/27/2019

Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks

Natural gradient descent has proven effective at mitigating the effects ...

35 Guodong Zhang, et al. ∙

research

∙ 05/13/2019

Differentiable Game Mechanics

Deep learning is built on the foundational guarantee that gradient desce...

0 Alistair Letcher, et al. ∙

research

∙ 02/06/2019

On the Variance of Unbiased Online Recurrent Optimization

The recently proposed Unbiased Online Recurrent Optimization algorithm (...

0 Tim Cooijmans, et al. ∙

research

∙ 02/15/2018

The Mechanics of n-Player Differentiable Games

The cornerstone underpinning deep learning is the guarantee that gradien...

0 David Balduzzi, et al. ∙

research

∙ 02/03/2016

A Kronecker-factored approximate Fisher matrix for convolution layers

Second-order optimization methods such as natural gradient descent have ...

0 Roger Grosse, et al. ∙

research

∙ 11/21/2015

Adding Gradient Noise Improves Learning for Very Deep Networks

Deep feedforward and recurrent networks have achieved impressive results...

0 Arvind Neelakantan, et al. ∙

research

∙ 03/19/2015

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

We propose an efficient method for approximating natural gradient descen...

0 James Martens, et al. ∙

research

∙ 12/03/2014

New insights and perspectives on the natural gradient method

Natural gradient descent is an optimization method traditionally motivat...

0 James Martens, et al. ∙

research

∙ 11/27/2014

On the Expressive Efficiency of Sum Product Networks

Sum Product Networks (SPNs) are a recently developed class of deep gener...

0 James Martens, et al. ∙

research

∙ 06/27/2012

Estimating the Hessian by Back-propagating Curvature

In this work we develop Curvature Propagation (CP), a general technique ...

0 James Martens, et al. ∙

James Martens

Featured Co-authors

Sign in with Google

Consider DeepAI Pro