Multilingual semantic search is the task of retrieving relevant contents...
Punctuation restoration is an important task in automatic speech recogni...
Being able to perceive the semantics and the spatial structure of the
en...
Over the last few years, large language models (LLMs) have emerged as th...
Vulnerability to lexical perturbation is a critical weakness of automati...
The goal of multimodal summarization is to extract the most important
in...
Livestream videos have become a significant part of online learning, whe...
Multimedia summarization with multimodal output (MSMO) is a recently exp...
Since BERT (Devlin et al., 2018), learning contextualized word embedding...
Modern image captioning models are usually trained with text similarity
...
Despite the recent advancements in abstractive summarization systems
lev...
Multimedia summarization with multimodal output can play an essential ro...
Demand for image editing has been increasing as users' desire for expres...
Explaining how important each input feature is to a classifier's decisio...
In this work, we focus on a more challenging few-shot intent detection
s...
With the explosive growth of livestream broadcasting, there is an urgent...
Since the first end-to-end neural coreference resolution model was
intro...
Despite the success of various text generation metrics such as BERTScore...
Recently, language-guided global image editing draws increasing attentio...
Multilingual models, such as M-BERT and XLM-R, have gained increasing
po...
Event coreference resolution is an important research problem with many
...
Do state-of-the-art natural language understanding models care about wor...
With the renaissance of deep learning, neural networks have achieved
pro...
Keyphrase extraction is the task of extracting a small set of phrases th...
Structured representations like graphs and parse trees play a crucial ro...
Language-driven image editing can significantly save the laborious image...
We consider the problem of segmenting image regions given a natural lang...
The performance of many machine learning models depends on their
hyper-p...
Despite the growth of e-commerce, brick-and-mortar stores are still the
...
Open-domain question answering aims at solving the task of locating the
...
Many users communicate with chatbots and AI assistants in order to help ...
Many users communicate with chatbots and AI assistants in order to help ...
Visual Dialog involves "understanding" the dialog history (what has been...
For the automatic evaluation of Generative Question Answering (genQA)
sy...
In this paper, we present a multimodal dialogue system for Conversationa...
Natural Language Image Editing (NLIE) aims to use natural language
instr...
Recent works have shown that generative data augmentation, where synthet...
Recent works have shown that generative data augmentation, where synthet...
Attention mechanisms have improved the performance of NLP tasks while
pr...
In a task-oriented dialog system, the goal of dialog state tracking (DST...
Answer selection is an important research problem, with applications in ...
In this study, we propose a novel graph neural network, called
propagate...
In this paper, we explore various approaches for semi supervised learnin...
Describing images with text is a fundamental problem in vision-language
...
In this paper, we propose a novel method for a sentence-level
answer-sel...
This work presents computational methods for transferring body movements...
Popular e-commerce websites such as Amazon offer community question answ...
This work presents the task of modifying images in an image editing prog...
Neural abstractive summarization models have led to promising results in...
As two of the five traditional human senses (sight, hearing, taste, smel...