Sound can convey significant information for spatial reasoning in our da...
Visual localization is the task of estimating a 6-DoF camera pose of a q...
Generating intermediate steps, or Chain of Thought (CoT), is an effectiv...
Large language models (LLMs) learn not only natural text generation abil...
In order to build self-consistent personalized dialogue agents, previous...
Large language models for code have recently shown remarkable performanc...
Variational autoencoders employ an amortized inference model to approxim...
360^∘ video saliency detection is one of the challenging benchmarks for
...
Displacement is an important measurement for the assessment of structura...
We study the problem of unsupervised skill discovery, whose goal is to l...
We present neural activation coding (NAC) as a novel approach for learni...
In reinforcement learning, continuous time is often discretized by a tim...
Continually learning in the real world must overcome many challenges, am...
360^∘ videos convey holistic views for the surroundings of a scene. It
p...
Empathy is a complex cognitive ability based on the reasoning of others'...
Having the ability to acquire inherent skills from environments without ...
We propose a novel information bottleneck (IB) method named Drop-Bottlen...
Large-scale datasets are the cornerstone of self-supervised representati...
The recent success of Transformers in the language domain has motivated
...
Learning from imperfect data becomes an issue in many industrial applica...
Continual learning from a sequential stream of data is a crucial challen...
We present a novel data augmentation technique, CRA (Contextual Response...
Although consistency has been a long-standing issue in dialogue agents, ...
We present an approach named CurlingNet that can measure the semantic
di...
Knowledge-grounded dialogue is a task of generating an informative respo...
Despite the growing interest in continual learning, most of its contempo...
Recent advances in conditional image generation tasks, such as image-to-...
Although deep convolutional networks have achieved improved performance ...
We address the problem of abstractive summarization in two directions:
p...
We present an approach named JSFusion (Joint Sequence Fusion) that can
m...
Video prediction aims to generate realistic future frames by learning dy...
We address the problem of story-based temporal summarization of long
360...
Variational autoencoders (VAE) combined with hierarchical RNNs have emer...
We propose an approach to address two issues that commonly occur during
...
We address the problem of highlight detection from a 360 degree video by...
We propose a novel memory network model named Read-Write Memory Network
...
The attention mechanisms in deep neural networks are inspired by human's...
YouTube-8M is the largest video dataset for multi-label video classifica...
We address personalization issues of image captioning, which have not be...
We propose a high-level concept word detector that can be integrated wit...
Deep learning (DL) has achieved notable successes in many machine learni...