Routing with Self-Attention for Multimodal Capsule Networks

12/01/2021
by   Kevin Duarte, et al.
0

The task of multimodal learning has seen a growing interest recently as it allows for training neural architectures based on different modalities such as vision, text, and audio. One challenge in training such models is that they need to jointly learn semantic concepts and their relationships across different input representations. Capsule networks have been shown to perform well in context of capturing the relation between low-level input features and higher-level concepts. However, capsules have so far mainly been used only in small-scale fully supervised settings due to the resource demand of conventional routing algorithms. We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data. To adapt the capsules to large-scale input data, we propose a novel routing by self-attention mechanism that selects relevant capsules which are then used to generate a final joint multimodal feature representation. This allows not only for robust training with noisy video data, but also to scale up the size of the capsule network compared to traditional routing methods while still being computationally efficient. We evaluate the proposed architecture by pretraining it on a large-scale multimodal video dataset and applying it on four datasets in two challenging downstream tasks. Results show that the proposed multimodal capsule network is not only able to improve results compared to other routing techniques, but also achieves competitive performance on the task of multimodal learning.

READ FULL TEXT

page 8

page 12

page 13

page 14

research
12/17/2019

Capsule Attention for Multimodal EEG and EOG Spatiotemporal Representation Learning with Application to Driver Vigilance Estimation

Driver vigilance estimation is an important task for transportation safe...
research
04/29/2019

Self-Attention Capsule Networks for Image Classification

We propose a novel architecture for image classification, called Self-At...
research
07/19/2023

ProtoCaps: A Fast and Non-Iterative Capsule Network Routing Method

Capsule Networks have emerged as a powerful class of deep learning archi...
research
06/15/2023

Revealing the Illusion of Joint Multimodal Understanding in VideoQA Models

While VideoQA Transformer models demonstrate competitive performance on ...
research
07/31/2019

Capsule Networks Need an Improved Routing Algorithm

In capsule networks, the routing algorithm connects capsules in consecut...
research
09/04/2020

Dynamic Context-guided Capsule Network for Multimodal Machine Translation

Multimodal machine translation (MMT), which mainly focuses on enhancing ...
research
10/15/2018

A Context-aware Capsule Network for Multi-label Classification

Recently proposed Capsule Network is a brain inspired architecture that ...

Please sign up or login with your details

Forgot password? Click here to reset