State-of-the-art rehearsal-free continual learning methods exploit the
p...
Animating still face images with deep generative models using a speech i...
This work builds on a previous work on unsupervised speech enhancement u...
Self-supervised learning models have been shown to learn rich visual
rep...
Pose and motion priors are crucial for recovering realistic and accurate...
In this paper, we present a multimodal and dynamical VAE (MDVAE)
applied...
The dynamical variational autoencoders (DVAEs) are a family of
latent-va...
We address the task of unconditional head motion generation to animate s...
This paper tackles the problem of human motion prediction, consisting in...
With the increasing presence of robots in our every-day environments,
im...
The recent success of audio-visual representation learning can be largel...
Understanding and controlling latent representations in deep generative
...
Studies on the automatic processing of 3D human pose data have flourishe...
A fundamental and challenging problem in deep learning is catastrophic
f...
In this paper, we present an unsupervised probabilistic model and associ...
Over the past years, semantic segmentation, as many other tasks in compu...
Self-supervised models have been shown to produce comparable or better v...
A longstanding goal in reinforcement learning is to build intelligent ag...
Transfer in Reinforcement Learning aims to improve learning performance ...
Dynamical variational auto-encoders (DVAEs) are a class of deep generati...
The Variational Autoencoder (VAE) is a powerful deep generative model th...
Human motion prediction aims to forecast future human poses given a sequ...
Transformer networks have proven extremely powerful for a wide variety o...
Prediction of human actions in social interactions has important applica...
Convolutional neural networks have enabled major progress in addressing
...
Multi-scale representations deeply learned via convolutional neural netw...
Recent literature addressed the monocular 3D pose estimation task very
s...
The Variational Autoencoder (VAE) is a powerful deep generative model th...
In this paper, we are interested in audio-visual speech separation given...
Manipulating visual attributes of images through human-written text is a...
Modeling the temporal behavior of data is of primordial importance in ma...
We address the problem of analyzing the performance of 3D face alignment...
Unsupervised image-to-image translation (UNIT) aims at learning a mappin...
In this paper, we are interested in unsupervised speech enhancement usin...
Recently, an audio-visual speech generative model based on variational
a...
This paper presents a generative approach to speech enhancement based on...
Variational auto-encoders (VAEs) are deep generative latent variable mod...
Multiple Object Tracking accuracy and precision (MOTA and MOTP) are two
...
Unsupervised person re-identification (Re-ID) methods consist of trainin...
In this paper, we address the problem of simultaneously tracking several...
This paper presents an online multiple-speaker localization and tracking...
In this paper we address the problem of tracking multiple speakers via t...
This paper addresses the problem of online multiple-speaker localization...
In this paper, we address the problem of how to robustly train a ConvNet...
Deep learning revolutionized data science, and recently, its popularity ...
In this paper we address the problem of learning robust cross-domain
rep...
Each smile is unique: one person surely smiles in different ways (e.g.,
...
Recent works have shown that exploiting multi-scale representations deep...
Recent works have shown that it is possible to automatically predict
int...
In our overly-connected world, the automatic recognition of virality - t...