Deep Multimodal Neural Architecture Search

04/25/2020
by   Zhou Yu, et al.
HUAWEI Technologies Co., Ltd.
The University of Sydney
Hangzhou Dianzi University
17

Designing effective neural networks is fundamentally important in deep multimodal learning. Most existing works focus on a single task and design neural architectures manually, which are highly task-specific and hard to generalize to different tasks. In this paper, we devise a generalized deep multimodal neural architecture search (MMnas) framework for various multimodal learning tasks. Given multimodal input, we first define a set of primitive operations, and then construct a deep encoder-decoder based unified backbone, where each encoder or decoder block corresponds to an operation searched from a predefined operation pool. On top of the unified backbone, we attach task-specific heads to tackle different multimodal learning tasks. By using a gradient-based NAS algorithm, the optimal architectures for different tasks are learned efficiently. Extensive ablation studies, comprehensive analysis, and superior experimental results show that MMnasNet significantly outperforms existing state-of-the-art approaches across three multimodal learning tasks (over five datasets), including visual question answering, image-text matching, and visual grounding. Code will be made available.

READ FULL TEXT

page 1

page 3

04/19/2021

BM-NAS: Bilevel Multimodal Neural Architecture Search

Deep neural networks (DNNs) have shown superior performances on various ...
01/14/2020

Neural Architecture Search for Deep Image Prior

We present a neural architecture search (NAS) technique to enhance the p...
05/12/2020

Neural Architecture Transfer

Neural architecture search (NAS) has emerged as a promising avenue for a...
03/23/2021

Neural Architecture Search From Fréchet Task Distance

We formulate a Fréchet-type asymmetric distance between tasks based on F...
08/04/2021

Generic Neural Architecture Search via Regression

Most existing neural architecture search (NAS) algorithms are dedicated ...
12/10/2019

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Convolutional neural networks typically encode an input image into a ser...
05/26/2023

FSD: Fully-Specialized Detector via Neural Architecture Search

In this paper, we first propose and examine a fully-automatic pipeline t...

Code Repositories

Please sign up or login with your details

Forgot password? Click here to reset