Multimodal Transformer for Parallel Concatenated Variational Autoencoders

10/28/2022
by   Stephen D. Liang, et al.
0

In this paper, we propose a multimodal transformer using parallel concatenated architecture. Instead of using patches, we use column stripes for images in R, G, B channels as the transformer input. The column stripes keep the spatial relations of original image. We incorporate the multimodal transformer with variational autoencoder for synthetic cross-modal data generation. The multimodal transformer is designed using multiple compression matrices, and it serves as encoders for Parallel Concatenated Variational AutoEncoders (PC-VAE). The PC-VAE consists of multiple encoders, one latent space, and two decoders. The encoders are based on random Gaussian matrices and don't need any training. We propose a new loss function based on the interaction information from partial information decomposition. The interaction information evaluates the input cross-modal information and decoder output. The PC-VAE are trained via minimizing the loss function. Experiments are performed to validate the proposed multimodal transformer for PC-VAE.

READ FULL TEXT

page 4

page 6

page 8

page 9

research
06/14/2019

Modality Conversion of Handwritten Patterns by Cross Variational Autoencoders

This research attempts to construct a network that can convert online an...
research
02/05/2021

Learning Audio-Visual Correlations from Variational Cross-Modal Generation

People can easily imagine the potential sound while seeing an event. Thi...
research
09/07/2022

Benchmarking Multimodal Variational Autoencoders: GeBiD Dataset and Toolkit

Multimodal Variational Autoencoders (VAEs) have been a subject of intens...
research
06/11/2023

Multimodal Pathology Image Search Between H E Slides and Multiplexed Immunofluorescent Images

We present an approach for multimodal pathology image search, using dyna...
research
07/27/2022

A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

We propose a VAE for Transformers by developing a variational informatio...
research
05/19/2023

Improving Multimodal Joint Variational Autoencoders through Normalizing Flows and Correlation Analysis

We propose a new multimodal variational autoencoder that enables to gene...
research
02/07/2022

Unsupervised physics-informed disentanglement of multimodal data for high-throughput scientific discovery

We introduce physics-informed multimodal autoencoders (PIMA) - a variati...

Please sign up or login with your details

Forgot password? Click here to reset