Building a Large-scale Multimodal Knowledge Base System for Answering Visual Queries

07/20/2015
by   Yuke Zhu, et al.
0

The complexity of the visual world creates significant challenges for comprehensive visual understanding. In spite of recent successes in visual recognition, today's vision systems would still struggle to deal with visual queries that require a deeper reasoning. We propose a knowledge base (KB) framework to handle an assortment of visual queries, without the need to train new classifiers for new tasks. Building such a large-scale multimodal KB presents a major challenge of scalability. We cast a large-scale MRF into a KB representation, incorporating visual, textual and structured data, as well as their diverse relations. We introduce a scalable knowledge base construction system that is capable of building a KB with half billion variables and millions of parameters in a few hours. Our system achieves competitive results compared to purpose-built models on standard recognition and retrieval tasks, while exhibiting greater flexibility in answering richer visual queries.

READ FULL TEXT

page 1

page 6

page 8

page 14

research
05/01/2020

Knowledge Base Inference for Regular Expression Queries

Two common types of tasks on Knowledge Bases have been studied – single ...
research
11/28/2017

Hyper-dimensional computing for a visual question-answering system that is trainable end-to-end

In this work we propose a system for visual question answering. Our arch...
research
07/28/2022

Visual Recognition by Request

In this paper, we present a novel protocol of annotation and evaluation ...
research
09/23/2019

Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network

Explanation and high-order reasoning capabilities are crucial for real-w...
research
11/07/2016

Gaussian Attention Model and Its Application to Knowledge Base Embedding and Question Answering

We propose the Gaussian attention model for content-based neural memory ...
research
12/10/2022

MAPS-KB: A Million-scale Probabilistic Simile Knowledge Base

The ability to understand and generate similes is an imperative step to ...
research
02/11/2023

Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis

Often, deep network models are purely inductive during training and whil...

Please sign up or login with your details

Forgot password? Click here to reset