Training Multimodal Systems for Classification with Multiple Objectives

08/26/2020
by   Jason Armitage, et al.
11

We learn about the world from a diverse range of sensory information. Automated systems lack this ability as investigation has centred on processing information presented in a single form. Adapting architectures to learn from multiple modalities creates the potential to learn rich representations of the world - but current multimodal systems only deliver marginal improvements on unimodal approaches. Neural networks learn sampling noise during training with the result that performance on unseen data is degraded. This research introduces a second objective over the multimodal fusion process learned with variational inference. Regularisation methods are implemented in the inner training loop to control variance and the modular structure stabilises performance as additional neurons are added to layers. This framework is evaluated on a multilabel classification task with textual and visual inputs to demonstrate the potential for multiple objectives and probabilistic methods to lower variance and improve generalisation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2020

On the Benefits of Early Fusion in Multimodal Representation Learning

Intelligently reasoning about the world often requires integrating data ...
research
05/29/2018

Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data

Multimodal sensory data resembles the form of information perceived by h...
research
11/18/2018

Multimodal Densenet

Humans make accurate decisions by interpreting complex data from multipl...
research
03/15/2022

Modular and Parameter-Efficient Multimodal Fusion with Prompting

Recent research has made impressive progress in large-scale multimodal p...
research
10/09/2019

Multimodal representation models for prediction and control from partial information

Similar to humans, robots benefit from interacting with their environmen...
research
07/28/2019

Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

Contact-rich manipulation tasks in unstructured environments often requi...
research
08/06/2018

Liquid Pouring Monitoring via Rich Sensory Inputs

Humans have the amazing ability to perform very subtle manipulation task...

Please sign up or login with your details

Forgot password? Click here to reset