Bodily behavioral language is an important social cue, and its automated...
Interpretability and explainability of neural networks is continuously
i...
Automatic Speech Recognition (ASR) systems often struggle with transcrib...
Graph Neural Networks (GNNs) have shown considerable success in neural
a...
Segmentation of objects in a video is challenging due to the nuances suc...
Face parsing is defined as the per-pixel labeling of images containing h...
The vision community has explored numerous pose guided human editing met...
The task of human reposing involves generating a realistic image of a pe...
We aim to resolve this problem by introducing a comprehensive distribute...
Despite recent advancements in deep learning technologies, Child Speech
...
Speech synthesis has come a long way as current text-to-speech (TTS) mod...
Image-based virtual try-on involves synthesizing perceptually convincing...
Can we develop visually grounded dialog agents that can efficiently adap...
In this paper, we focus on quantifying model stability as a function of
...
Explaining and interpreting the decisions of recommender systems are bec...
We introduce EvalAI, an open source platform for evaluating and comparin...
Image captioning models have achieved impressive results on datasets
con...