Joan Puigcerver

research

∙ 08/02/2023

From Sparse to Soft Mixtures of Experts

Sparse mixture of expert architectures (MoEs) scale model capacity witho...

0 Joan Puigcerver, et al. ∙

research

∙ 07/12/2023

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

The ubiquitous and demonstrably suboptimal choice of resizing images to ...

0 Mostafa Dehghani, et al. ∙

research

∙ 02/02/2023

Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective

The top-k operator returns a k-sparse vector, where the non-zero values ...

0 Michael E. Sander, et al. ∙

research

∙ 12/09/2022

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Training large, deep neural networks to convergence can be prohibitively...

0 Aran Komatsuzaki, et al. ∙

research

∙ 10/19/2022

On the Adversarial Robustness of Mixture of Experts

Adversarial robustness is a key desirable property of neural networks. I...

0 Joan Puigcerver, et al. ∙

research

∙ 09/30/2022

Sparsity-Constrained Optimal Transport

Regularized optimal transport (OT) is now increasingly used as a loss or...

0 Tianlin Liu, et al. ∙

research

∙ 09/14/2022

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Effective scaling and a flexible task interface enable large language mo...

6 Xi Chen, et al. ∙

research

∙ 06/06/2022

Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

Large sparsely-activated models have obtained excellent performance in m...

6 Basil Mustafa, et al. ∙

research

∙ 02/24/2022

Learning to Merge Tokens in Vision Transformers

Transformers are widely applied to solve natural language understanding ...

3 Cedric Renggli, et al. ∙

research

∙ 10/07/2021

Sparse MoEs meet Efficient Ensembles

Machine learning models based on the aggregated outputs of submodels, ei...

3 James Urquhart Allingham, et al. ∙

research

∙ 06/10/2021

Scaling Vision with Sparse Mixture of Experts

Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated exce...

6 Carlos Riquelme, et al. ∙

research

∙ 10/14/2020

Deep Ensembles for Low-Data Transfer Learning

In the low-data regime, it is difficult to train good supervised models ...

19 Basil Mustafa, et al. ∙

research

∙ 10/13/2020

Which Model to Transfer? Finding the Needle in the Growing Haystack

Transfer learning has been recently popularized as a data-efficient alte...

19 Cedric Renggli, et al. ∙

research

∙ 09/28/2020

Scalable Transfer Learning with Expert Models

Transfer of pre-trained representations can improve sample efficiency an...

11 Joan Puigcerver, et al. ∙

research

∙ 07/16/2020

On Robustness and Transferability of Convolutional Neural Networks

Modern deep convolutional networks (CNNs) are often criticized for not g...

15 Josip Djolonga, et al. ∙

research

∙ 12/24/2019

Large Scale Learning of General Visual Representations for Transfer

Transfer of pre-trained representations improves sample efficiency and s...

10 Alexander Kolesnikov, et al. ∙

Joan Puigcerver

Featured Co-authors

Sign in with Google

Consider DeepAI Pro