In recent years, substantial advancements in pre-trained language models...
The capabilities and use cases of automatic natural language processing ...
Document-level relation extraction aims to identify relationships betwee...
This paper highlights the need to bring document classification benchmar...
The goal of News Image Captioning is to generate an image caption accord...
The focal point of egocentric video understanding is modelling hand-obje...
Decoding visual stimuli from neural responses recorded by functional Mag...
Denoising Diffusion Probabilistic Models (DDPM) have shown remarkable
ef...
Existing question answering methods often assume that the input content
...
Current research on the advantages and trade-offs of using characters,
i...
Medical multiple-choice question answering (MCQA) is particularly diffic...
Leveraging contextual knowledge has become standard practice in automate...
The detection and prevention of illegal fishing is critical to maintaini...
In this work, we study the problem of Embodied Referring Expression
Grou...
Most spoken language understanding systems use a pipeline approach compo...
We explore the benefits that multitask learning offer to speech processi...
Words of estimative probability (WEP) are expressions of a statement's
p...
We revisit the weakly supervised cross-modal face-name alignment task; t...
The focal point of egocentric video understanding is modelling hand-obje...
Lifelong language learning seeks to have models continuously learn multi...
A dialogue policy module is an essential part of task-completion dialogu...
The ability to continuously learn remains elusive for deep learning mode...
In this work, we investigate the knowledge learned in the embeddings of
...
This paper attacks the problem of language-guided navigation in a new
pe...
Visual dialog is a vision-language task where an agent needs to answer a...
Knowledge-based visual question answering (VQA) is a vision-language tas...
Discrete and continuous representations of content (e.g., of language or...
By leveraging deep learning to automatically classify camera trap images...
Task embeddings are low-dimensional representations that are trained to
...
In recent years, we have seen significant steps taken in the development...
Recognizing human actions is fundamentally a spatio-temporal reasoning
p...
Current language models are usually trained using a self-supervised sche...
Though language model text embeddings have revolutionized NLP research, ...
User-generated content (e.g., tweets and profile descriptions) and share...
The lottery ticket hypothesis states that sparse subnetworks exist in
ra...
Current technology for autonomous cars primarily focuses on getting the
...
We describe our approach for SemEval-2021 task 6 on detection of persuas...
This paper proposes an iterative inference algorithm for multi-hop
expla...
Many top-performing image captioning models rely solely on object featur...
The task of visual grounding requires locating the most relevant region ...
Truth can vary over time. Therefore, fact-checking decisions on claim
ve...
Public, professional and academic interest in automated fact-checking ha...
The paper proposes a novel technique for representing templates and inst...
Recent generative models for 2D images achieve impressive visual results...
Time is deeply woven into how people perceive, and communicate about the...
This paper presents our system entitled `LIIR' for SemEval-2020 Task 12 ...
We propose a new spatial memory module and a spatial reasoner for the Vi...
The correct use of Dutch pronouns 'die' and 'dat' is a stumbling block f...
A variety of approaches have been proposed to automatically infer the
pr...
A long-term goal of artificial intelligence is to have an agent execute
...