Multimodal Neural Databases

05/02/2023
by   Giovanni Trappolini, et al.
6

The rise in loosely-structured data available through text, images, and other modalities has called for new ways of querying them. Multimedia Information Retrieval has filled this gap and has witnessed exciting progress in recent years. Tasks such as search and retrieval of extensive multimedia archives have undergone massive performance improvements, driven to a large extent by recent developments in multimodal deep learning. However, methods in this field remain limited in the kinds of queries they support and, in particular, their inability to answer database-like queries. For this reason, inspired by recent work on neural databases, we propose a new framework, which we name Multimodal Neural Databases (MMNDBs). MMNDBs can answer complex database-like queries that involve reasoning over different input modalities, such as text and images, at scale. In this paper, we present the first architecture able to fulfill this set of requirements and test it with several baselines, showing the limitations of currently available models. The results show the potential of these new techniques to process unstructured data coming from different modalities, paving the way for future research in the area. Code to replicate the experiments will be released at https://github.com/GiovanniTRA/MultimodalNeuralDatabases

READ FULL TEXT
research
09/28/2022

Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text

Multimodal learning is a recent challenge that extends unimodal learning...
research
01/14/2021

OrigamiSet1.0: Two New Datasets for Origami Classification and Difficulty Estimation

Origami is becoming more and more relevant to research. However, there i...
research
06/02/2021

Database Reasoning Over Text

Neural models have shown impressive performance gains in answering queri...
research
07/30/2023

Unified Model for Image, Video, Audio and Language Tasks

Large Language Models (LLMs) have made the ambitious quest for generalis...
research
02/06/2023

MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields

Multimodal learning has attracted the interest of the machine learning c...
research
02/18/2022

A Review on Methods and Applications in Multimodal Deep Learning

Deep Learning has implemented a wide range of applications and has becom...
research
01/30/2018

The New Modality: Emoji Challenges in Prediction, Anticipation, and Retrieval

Over the past decade, emoji have emerged as a new and widespread form of...

Please sign up or login with your details

Forgot password? Click here to reset