Jean-Baptiste Alayrac

research

∙ 05/03/2023

Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime

Large-scale visual language models are widely used as pre-trained models...

0 Chuhan Zhang, et al. ∙

research

∙ 03/23/2023

Three ways to improve feature alignment for open vocabulary detection

The core problem in zero-shot open vocabulary detection is how to align ...

0 Relja Arandjelović, et al. ∙

research

∙ 01/23/2023

Zorro: the masked multimodal transformer

Attention-based models are appealing for multimodal processing because i...

17 Adria Recasens, et al. ∙

research

∙ 11/24/2022

Multi-Task Learning of Object State Changes from Uncurated Videos

We aim to learn to temporally localize object state changes and the corr...

0 Tomáš Souček, et al. ∙

research

∙ 04/29/2022

Flamingo: a Visual Language Model for Few-Shot Learning

Building models that can be rapidly adapted to numerous tasks using only...

7 Jean-Baptiste Alayrac, et al. ∙

research

∙ 03/22/2022

Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos

Human actions often induce changes of object states such as "cutting an ...

4 Tomáš Souček, et al. ∙

research

∙ 02/15/2022

General-purpose, long-context autoregressive modeling with Perceiver AR

Real-world data is high-dimensional: a book, image, or musical performan...

2 Curtis Hawthorne, et al. ∙

research

∙ 11/23/2021

Towards Learning Universal Audio Representations

The ability to learn universal audio representations that can solve dive...

0 Luyu Wang, et al. ∙

research

∙ 07/30/2021

Perceiver IO: A General Architecture for Structured Inputs Outputs

The recently-proposed Perceiver model obtains good results on several do...

6 Andrew Jaegle, et al. ∙

research

∙ 05/01/2021

Generative Art Using Neural Visual Grammars and Dual Encoders

Whilst there are perhaps only a few scientific methods, there seem to be...

42 Chrisantha Fernando, et al. ∙

research

∙ 04/26/2021

Multimodal Self-Supervised Learning of General Audio Representations

We present a multimodal framework to learn general audio representations...

0 Luyu Wang, et al. ∙

research

∙ 04/12/2021

Machine Translation Decoding beyond Beam Search

Beam search is the go-to method for decoding auto-regressive machine tra...

0 Rémi Leblond, et al. ∙

research

∙ 03/30/2021

Broaden Your Views for Self-Supervised Video Learning

Most successful self-supervised learning methods are trained to align th...

3 Adria Recasens, et al. ∙

research

∙ 03/30/2021

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

Our objective is language-based search of large-scale image and video da...

7 Antoine Miech, et al. ∙

research

∙ 03/19/2021

Efficient Visual Pretraining with Contrastive Detection

Self-supervised pretraining has been shown to yield powerful representat...

0 Olivier J. Hénaff, et al. ∙

research

∙ 01/31/2021

Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers

Recently multimodal transformer models have gained popularity because th...

0 Lisa Anne Hendricks, et al. ∙

research

∙ 08/03/2020

RareAct: A video dataset of unusual interactions

This paper introduces a manually annotated video dataset of unusual acti...

15 Antoine Miech, et al. ∙

research

∙ 06/29/2020

Self-Supervised MultiModal Versatile Networks

Videos are a rich source of multi-modal supervision. In this work, we le...

82 Jean-Baptiste Alayrac, et al. ∙

research

∙ 05/07/2020

Learning to Segment Actions from Observation and Narration

We apply a generative segmental model of task structure, guided by narra...

4 Daniel Fried, et al. ∙

research

∙ 03/11/2020

Visual Grounding in Video for Unsupervised Word Translation

There are thousands of actively spoken languages on Earth, but a single ...

8 Gunnar A. Sigurdsson, et al. ∙

research

∙ 12/13/2019

End-to-End Learning of Visual Representations from Uncurated Instructional Videos

Annotating videos is cumbersome, expensive and not scalable. Yet, many s...

35 Antoine Miech, et al. ∙

research

∙ 10/24/2019

Controllable Attention for Structured Layered Video Decomposition

The objective of this paper is to be able to separate a video into its n...

24 Jean-Baptiste Alayrac, et al. ∙

research

∙ 06/07/2019

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips

Learning text-video embeddings usually requires a dataset of video clips...

1 Antoine Miech, et al. ∙

research

∙ 05/31/2019

Are Labels Required for Improving Adversarial Robustness?

Recent work has uncovered the interesting (and somewhat surprising) find...

0 Jonathan Uesato, et al. ∙

research

∙ 03/19/2019

Cross-task weakly supervised learning from instructional videos

In this paper we investigate learning visual models for the steps of ord...

4 Dimitri Zhukov, et al. ∙

research

∙ 12/04/2018

The Visual Centrifuge: Model-Free Layered Video Representations

True video understanding requires making sense of non-lambertian scenes ...

2 Jean-Baptiste Alayrac, et al. ∙

research

∙ 09/22/2018

Learning to Localize and Align Fine-Grained Actions to Sparse Instructions

Automatic generation of textual video descriptions that are time-aligned...

6 Meera Hahn, et al. ∙

research

∙ 06/29/2018

A flexible model for training action localization with varying levels of supervision

Spatio-temporal action detection in videos is typically addressed in a f...

2 Guilhem Chéron, et al. ∙

research

∙ 07/27/2017

Learning from Video and Text via Large-Scale Discriminative Clustering

Discriminative clustering has been successfully applied to a number of w...

0 Antoine Miech, et al. ∙

research

∙ 06/14/2017

SEARNN: Training RNNs with Global-Local Losses

We propose SEARNN, a novel training algorithm for recurrent neural netwo...

0 Rémi Leblond, et al. ∙

research

∙ 02/09/2017

Joint Discovery of Object States and Manipulation Actions

Many human activities involve object manipulations aiming to modify the ...

0 Jean-Baptiste Alayrac, et al. ∙

research

∙ 05/30/2016

Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs

In this paper, we propose several improvements on the block-coordinate F...

0 Anton Osokin, et al. ∙

research

∙ 06/30/2015

Unsupervised Learning from Narrated Instruction Videos

We address the problem of automatically learning the main steps to compl...

0 Jean-Baptiste Alayrac, et al. ∙

Jean-Baptiste Alayrac

Featured Co-authors

Sign in with Google

Consider DeepAI Pro