In high-stakes settings, Machine Learning models that can provide predic...
Multimodal demand forecasting aims at predicting product demand utilizin...
Existing datasets for manually labelled query-based video summarization ...
Deep neural networks have been critical in the task of Visual Question
A...
Medical Visual Question Answering (VQA) is an important challenge, as it...
Multimodal few-shot learning is challenging due to the large domain gap
...
An important component of human analysis of medical images and their con...
There is a bidirectional relationship between culture and AI; AI models ...
Medical image datasets and their annotations are not growing as fast as ...
In this paper, we focus on multi-task classification, where related
clas...
Large collections of geo-referenced panoramic images are freely availabl...
Deep learning models have shown a great effectiveness in recognition of
...
Visual Place Recognition (VPR) is generally concerned with localizing ou...
Neural processes have recently emerged as a class of powerful neural lat...
Multi-task learning aims to explore task relatedness to improve individu...
In this paper, we discuss the initial attempts at boosting understanding...
Automating report generation for medical imaging promises to reduce work...
Automatically generating medical reports for retinal images is one of th...
We propose ArtSAGENet, a novel multimodal architecture that integrates G...
Medical image captioning automatically generates a medical description t...
Traditional video summarization methods generate fixed video representat...
Disease classification relying solely on imaging data attracts great int...
In this work, we propose an AI-based method that intends to improve the
...
Artificial, CNN-generated images are now of such high quality that human...
We introduce II-20 (Image Insight 2020), a multimedia analytics approach...
When video collections become huge, how to explore both within and acros...
Deep neural networks have been playing an essential role in the task of
...
The shift operation was recently introduced as an alternative to spatial...
Motivated by the promising performance of pre-trained language models, w...
Multimodal datasets contain an enormous amount of relational information...
In this paper we present a novel interactive multimodal learning system,...
In this paper we seek methods to effectively detect urban micro-events. ...
Increasing scale is a dominant trend in today's multimedia collections, ...
Multimedia applications often require concurrent solutions to multiple t...
Typical multi-task learning (MTL) methods rely on architectural adjustme...
Vast amounts of artistic data is scattered on-line from both museums and...
We introduce an unsupervised discriminative model for the task of retrie...