An Efficient Modern Baseline for FloodNet VQA

05/30/2022
by   Aditya Kane, et al.
23

Designing efficient and reliable VQA systems remains a challenging problem, more so in the case of disaster management and response systems. In this work, we revisit fundamental combination methods like concatenation, addition and element-wise multiplication with modern image and text feature abstraction models. We design a simple and efficient system which outperforms pre-existing methods on the FloodNet dataset and achieves state-of-the-art performance. This simplified system requires significantly less training and inference time than modern VQA architectures. We also study the performance of various backbones and report their consolidated results. Code is available at https://github.com/sahilkhose/floodnet_vqa.

READ FULL TEXT
04/03/2021

MMBERT: Multimodal BERT Pretraining for Improved Medical VQA

Images in the medical domain are fundamentally different from the genera...
05/29/2020

UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content

Recent years have witnessed an explosion of user-generated content (UGC)...
06/01/2021

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models

With large-scale pre-training, the past two years have witnessed signifi...
09/18/2022

Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances

Despite the great progress of Visual Question Answering (VQA), current V...
06/25/2019

Deep Modular Co-Attention Networks for Visual Question Answering

Visual Question Answering (VQA) requires a fine-grained and simultaneous...
07/26/2018

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

This document describes Pythia v0.1, the winning entry from Facebook AI ...
04/09/2020

X3D: Expanding Architectures for Efficient Video Recognition

This paper presents X3D, a family of efficient video networks that progr...