Deep Multimodal Neural Architecture Search

04/25/2020
by   Zhou Yu, et al.
17

Designing effective neural networks is fundamentally important in deep multimodal learning. Most existing works focus on a single task and design neural architectures manually, which are highly task-specific and hard to generalize to different tasks. In this paper, we devise a generalized deep multimodal neural architecture search (MMnas) framework for various multimodal learning tasks. Given multimodal input, we first define a set of primitive operations, and then construct a deep encoder-decoder based unified backbone, where each encoder or decoder block corresponds to an operation searched from a predefined operation pool. On top of the unified backbone, we attach task-specific heads to tackle different multimodal learning tasks. By using a gradient-based NAS algorithm, the optimal architectures for different tasks are learned efficiently. Extensive ablation studies, comprehensive analysis, and superior experimental results show that MMnasNet significantly outperforms existing state-of-the-art approaches across three multimodal learning tasks (over five datasets), including visual question answering, image-text matching, and visual grounding. Code will be made available.

READ FULL TEXT

page 1

page 3

research
04/19/2021

BM-NAS: Bilevel Multimodal Neural Architecture Search

Deep neural networks (DNNs) have shown superior performances on various ...
research
01/14/2020

Neural Architecture Search for Deep Image Prior

We present a neural architecture search (NAS) technique to enhance the p...
research
05/12/2020

Neural Architecture Transfer

Neural architecture search (NAS) has emerged as a promising avenue for a...
research
03/23/2021

Neural Architecture Search From Fréchet Task Distance

We formulate a Fréchet-type asymmetric distance between tasks based on F...
research
08/04/2021

Generic Neural Architecture Search via Regression

Most existing neural architecture search (NAS) algorithms are dedicated ...
research
12/10/2019

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Convolutional neural networks typically encode an input image into a ser...
research
05/26/2023

FSD: Fully-Specialized Detector via Neural Architecture Search

In this paper, we first propose and examine a fully-automatic pipeline t...

Please sign up or login with your details

Forgot password? Click here to reset