Large Language Models (LLMs) have the capacity of performing complex
sch...
Humans talk in free-form while negotiating the expressed meanings or com...
Pragmatic reasoning plays a pivotal role in deciphering implicit meaning...
We introduce MoviePuzzle, a novel challenge that targets visual narrativ...
We introduce CDBERT, a new learning paradigm that enhances the semantics...
Video-grounded dialogue understanding is a challenging problem that requ...
The emergent few-shot reasoning capabilities of Large Language Models (L...
Prior works on Information Extraction (IE) typically predict different t...
We propose a new task to benchmark scene understanding of embodied agent...
Semantic Web technology has successfully facilitated many RDF models wit...
Understanding realistic visual scene images together with language
descr...
Conventional saliency prediction models typically learn a deterministic
...
Exploiting internal statistics of a single natural image has long been r...
Humans possess a unique social cognition capability; nonverbal communica...
Due to the intractable partition function, training energy-based models
...
Aiming to understand how human (false-)belief–a core socio-cognitive
abi...
We propose a generative model of unordered point sets, such as point clo...
Dynamic patterns are characterized by complex spatial and motion pattern...
We propose a novel model to address the task of Visual Dialog which exhi...
This paper studies the supervised learning of the conditional distributi...
This paper studies the dynamic generator model for spatial-temporal proc...
This paper proposes a 3D shape descriptor network, which is a deep
convo...