Significant progress has been made in speaker dependent Lip-to-Speech
sy...
Humans have a natural ability to perform semantic associations with the
...
We present MParrotTTS, a unified multilingual, multi-speaker text-to-spe...
Text-to-speech (TTS) systems are modelled as mel-synthesizers followed b...
We investigate the problem of reducing mistake severity for fine-grained...
We investigate the Vision-and-Language Navigation (VLN) problem in the
c...
Humans have a natural ability to effortlessly comprehend linguistic comm...
Machine-generated speech is characterized by its limited or unnatural
em...
Domain generalization (DG) of machine learning algorithms is defined as ...
Multi-view Detection (MVD) is highly effective for occlusion reasoning a...
The combination of range sensors with color cameras can be very useful f...
We investigate Referring Image Segmentation (RIS), which outputs a
segme...
There has been increasing interest in building deep hierarchy-aware
clas...
We propose the AViNet architecture for audiovisual saliency
prediction. ...
We present GAZED- eye GAZe-guided EDiting for videos captured by a solit...
In this paper, we present a simple baseline for visual grounding for
aut...
In this paper, we investigate a constrained formulation of neural networ...
Multi-object tracking has seen a lot of progress recently, albeit with
s...
Detecting small obstacles on the road is critical for autonomous driving...
Learning computational models for visual attention (saliency estimation)...
Learning to mimic the smooth and deliberate camera movement of a human
c...
Recent works have proposed several long term tracking benchmarks and
hig...
Monocular head pose estimation requires learning a model that computes t...
We present a novel approach to optimally retarget videos for varied disp...
We present here, a novel network architecture called MergeNet for discov...
Localizing natural language phrases in images is a challenging problem t...
In this paper, we propose a new long video dataset (called Track Long an...
In this paper, we propose a novel method to register football broadcast ...
The prose storyboard language is a formal language for describing movies...