Razvan Pascanu

research

∙ 09/11/2023

Uncovering mesa-optimization algorithms in Transformers

Transformers have become the dominant model in deep learning, but the re...

0 Johannes von Oswald, et al. ∙

research

∙ 07/21/2023

On the Universality of Linear Recurrences Followed by Nonlinear Projections

In this note (work in progress towards a full-length paper) we show that...

0 Antonio Orvieto, et al. ∙

research

∙ 07/18/2023

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

Adaptive gradient-based optimizers, particularly Adam, have left their m...

0 Pranshu Malviya, et al. ∙

research

∙ 07/17/2023

Latent Space Representations of Neural Algorithmic Reasoners

Neural Algorithmic Reasoning (NAR) is a research area focused on designi...

0 Vladimir V. Mirjanić, et al. ∙

research

∙ 07/11/2023

Towards Robust and Efficient Continual Language Learning

As the application space of language models continues to evolve, a natur...

0 Adam Fisch, et al. ∙

research

∙ 06/27/2023

Asynchronous Algorithmic Alignment with Cocycles

State-of-the-art neural algorithmic reasoners make use of message passin...

0 Andrew Dudzik, et al. ∙

research

∙ 06/26/2023

Learning to Modulate pre-trained Models in RL

Reinforcement Learning (RL) has been successful in various domains like ...

0 Thomas Schmied, et al. ∙

research

∙ 06/14/2023

Kalman Filter for Online Classification of Non-Stationary Data

In Online Continual Learning (OCL) a learning system receives a stream o...

0 Michalis K. Titsias, et al. ∙

research

∙ 05/31/2023

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Deep neural networks are widely known for their remarkable effectiveness...

0 Wojciech Masarczyk, et al. ∙

research

∙ 04/25/2023

Towards Compute-Optimal Transfer Learning

The field of transfer learning is undergoing a significant shift with th...

0 Massimo Caccia, et al. ∙

research

∙ 03/11/2023

Resurrecting Recurrent Neural Networks for Long Sequences

Recurrent Neural Networks (RNNs) offer fast inference on long sequences ...

10 Antonio Orvieto, et al. ∙

research

∙ 03/02/2023

Understanding plasticity in neural networks

Plasticity, the ability of a neural network to quickly change its predic...

0 Clare Lyle, et al. ∙

research

∙ 01/12/2023

SemPPL: Predicting pseudo-labels for better contrastive representations

Learning from large amounts of unsupervised data and a small amount of s...

0 Matko Bošnjak, et al. ∙

research

∙ 09/28/2022

Disentangling Transfer in Continual Reinforcement Learning

The ability of continual learning systems to transfer knowledge from pre...

0 Maciej Wołczyk, et al. ∙

research

∙ 07/05/2022

An Empirical Study of Implicit Regularization in Deep Offline RL

Deep neural networks are the most commonly used function approximators i...

12 Caglar Gulcehre, et al. ∙

research

∙ 06/20/2022

When Does Re-initialization Work?

Re-initializing a neural network during training has been observed to im...

0 Sheheryar Zaidi, et al. ∙

research

∙ 05/31/2022

Pre-training via Denoising for Molecular Property Prediction

Many important problems involving molecular property prediction from 3D ...

0 Sheheryar Zaidi, et al. ∙

research

∙ 05/31/2022

The CLRS Algorithmic Reasoning Benchmark

Learning representations of algorithms is an emerging area of machine le...

0 Petar Veličković, et al. ∙

research

∙ 02/01/2022

Architecture Matters in Continual Learning

A large body of research in continual learning is devoted to overcoming ...

0 Seyed-Iman Mirzadeh, et al. ∙

research

∙ 01/13/2022

Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

Despite recent progress made by self-supervised methods in representatio...

0 Nenad Tomašev, et al. ∙

research

∙ 10/21/2021

Wide Neural Networks Forget Less Catastrophically

A growing body of research in continual learning is devoted to overcomin...

0 Seyed-Iman Mirzadeh, et al. ∙

research

∙ 10/01/2021

Powerpropagation: A sparsity inducing weight reparameterisation

The training of sparse neural networks is becoming an increasingly impor...

0 Jonathan Schwarz, et al. ∙

research

∙ 07/27/2021

On the Role of Optimization in Double Descent: A Least Squares Study

Empirically it has been observed that the performance of deep neural net...

0 Ilja Kuzborskij, et al. ∙

research

∙ 06/24/2021

Task-agnostic Continual Learning with Hybrid Probabilistic Models

Learning new tasks continuously without forgetting on a constantly chang...

0 Polina Kirichenko, et al. ∙

research

∙ 06/15/2021

Predicting Unreliable Predictions by Shattering a Neural Network

Piecewise linear neural networks can be split into subfunctions, each wi...

7 Xu Ji, et al. ∙

research

∙ 06/07/2021

Top-KAST: Top-K Always Sparse Training

Sparse neural networks are becoming increasingly important as the field ...

0 Siddhant M. Jayakumar, et al. ∙

research

∙ 05/31/2021

A study on the plasticity of neural networks

One aim shared by multiple settings, such as continual learning or trans...

0 Tudor Berariu, et al. ∙

research

∙ 05/27/2021

Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

In computer vision, it is standard practice to draw a single sample from...

0 Stanislav Fort, et al. ∙

research

∙ 05/23/2021

Continual World: A Robotic Benchmark For Continual Reinforcement Learning

Continual learning (CL) – the ability to continuously learn, building on...

0 Maciej Wołczyk, et al. ∙

research

∙ 05/11/2021

Spectral Normalisation for Deep Reinforcement Learning: an Optimisation Perspective

Most of the recent deep reinforcement learning advances take an RL-centr...

7 Florin Gogianu, et al. ∙

research

∙ 03/17/2021

Regularized Behavior Value Estimation

Offline reinforcement learning restricts the learning process to rely on...

8 Caglar Gulcehre, et al. ∙

research

∙ 10/27/2020

Behavior Priors for Efficient Reinforcement Learning

As we deploy reinforcement learning agents to solve increasingly challen...

10 Dhruva Tirumala, et al. ∙

research

∙ 10/20/2020

BYOL works even without batch statistics

Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach ...

0 Pierre H. Richemond, et al. ∙

research

∙ 10/09/2020

Linear Mode Connectivity in Multitask and Continual Learning

Continual (sequential) training and multitask (simultaneous) training ar...

0 Seyed-Iman Mirzadeh, et al. ∙

research

∙ 06/12/2020

Understanding the Role of Training Regimes in Continual Learning

Catastrophic forgetting affects the training of neural networks, limitin...

0 Seyed-Iman Mirzadeh, et al. ∙

research

∙ 06/11/2020

Pointer Graph Networks

Graph neural networks (GNNs) are typically applied to static graphs that...

33 Petar Veličković, et al. ∙

research

∙ 12/16/2019

A Deep Neural Network's Loss Surface Contains Every Low-dimensional Pattern

The work "Loss Landscape Sightseeing with Multi-Point Optimization" (Sko...

32 Wojciech Marian Czarnecki, et al. ∙

research

∙ 10/31/2019

Continual Unsupervised Representation Learning

Continual learning aims to improve the ability of modern learning system...

52 Dushyant Rao, et al. ∙

research

∙ 10/22/2019

Improving the Gating Mechanism of Recurrent Neural Networks

Gating mechanisms are widely used in neural network models, where they a...

30 Albert Gu, et al. ∙

research

∙ 10/13/2019

Stabilizing Transformers for Reinforcement Learning

Owing to their ability to both effectively integrate information over lo...

7 Emilio Parisotto, et al. ∙

research

∙ 08/30/2019

Meta-Learning with Warped Gradient Descent

A versatile and effective approach to meta-learning is to infer a gradie...

0 Sebastian Flennerhag, et al. ∙

research

∙ 06/12/2019

Task Agnostic Continual Learning via Meta Learning

While neural networks are powerful function approximators, they suffer f...

0 Xu He, et al. ∙

research

∙ 05/08/2019

Meta-learning of Sequential Strategies

In this report we review memory-based meta-learning as a tool for buildi...

16 Pedro A. Ortega, et al. ∙

research

∙ 05/03/2019

Information asymmetry in KL-regularized RL

Many real world tasks exhibit rich structure that is repeated across dif...

6 Alexandre Galashov, et al. ∙

research

∙ 04/25/2019

Ray Interference: a Source of Plateaus in Deep Reinforcement Learning

Rather than proposing a new method, this paper investigates an issue pre...

24 Tom Schaul, et al. ∙

research

∙ 03/18/2019

A RAD approach to deep mixture models

Flow based models such as Real NVP are an extremely powerful approach to...

4 Laurent Dinh, et al. ∙

research

∙ 03/18/2019

Exploiting Hierarchy for Learning and Transfer in KL-regularized RL

As reinforcement learning agents are tasked with solving more challengin...

20 Dhruva Tirumala, et al. ∙

research

∙ 02/06/2019

Distilling Policy Distillation

The transfer of knowledge from one policy to another is an important too...

14 Wojciech Marian Czarnecki, et al. ∙

research

∙ 01/31/2019

Functional Regularisation for Continual Learning using Gaussian Processes

We introduce a novel approach for supervised continual learning based on...

14 Michalis K. Titsias, et al. ∙

research

∙ 12/05/2018

Adapting Auxiliary Losses Using Gradient Similarity

One approach to deal with the statistical inefficiency of neural network...

10 Yunshu Du, et al. ∙

Razvan Pascanu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro