An Efficient Modern Baseline for FloodNet VQA

by   Aditya Kane, et al.

Designing efficient and reliable VQA systems remains a challenging problem, more so in the case of disaster management and response systems. In this work, we revisit fundamental combination methods like concatenation, addition and element-wise multiplication with modern image and text feature abstraction models. We design a simple and efficient system which outperforms pre-existing methods on the FloodNet dataset and achieves state-of-the-art performance. This simplified system requires significantly less training and inference time than modern VQA architectures. We also study the performance of various backbones and report their consolidated results. Code is available at


MMBERT: Multimodal BERT Pretraining for Improved Medical VQA

Images in the medical domain are fundamentally different from the genera...

UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content

Recent years have witnessed an explosion of user-generated content (UGC)...

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models

With large-scale pre-training, the past two years have witnessed signifi...

Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances

Despite the great progress of Visual Question Answering (VQA), current V...

Deep Modular Co-Attention Networks for Visual Question Answering

Visual Question Answering (VQA) requires a fine-grained and simultaneous...

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

This document describes Pythia v0.1, the winning entry from Facebook AI ...

X3D: Expanding Architectures for Efficient Video Recognition

This paper presents X3D, a family of efficient video networks that progr...