Sparse mixture of expert architectures (MoEs) scale model capacity witho...
The ubiquitous and demonstrably suboptimal choice of resizing images to ...
We introduce Three Towers (3T), a flexible method to improve the contras...
We propose a simple pairwise sigmoid loss for image-text pre-training. U...
Heteroscedastic classifiers, which learn a multivariate Gaussian distrib...
Multimodal models are becoming increasingly effective, in part due to un...
Training large, deep neural networks to convergence can be prohibitively...
Effective scaling and a flexible task interface enable large language mo...
Large sparsely-activated models have obtained excellent performance in
m...
Recent progress in Medical Artificial Intelligence (AI) has delivered sy...
Transformers are widely applied to solve natural language understanding ...
This paper presents contrastive-tuning, a simple method employing contra...
Machine learning models based on the aggregated outputs of submodels, ei...
Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated exce...
Large scale image classification datasets often contain noisy labels. We...
We develop and rigorously evaluate a deep learning based system that can...
Transfer learning is a standard technique to improve performance on task...
Self-supervised pretraining followed by supervised fine-tuning has seen
...
In the low-data regime, it is difficult to train good supervised models ...
Transfer of pre-trained representations can improve sample efficiency an...
Modelling uncertainty arising from input-dependent label noise is an
inc...
Semantic segmentation of medical images is a crucial step for the
quanti...