Learning Generalized Transformation Equivariant Representations via Autoencoding Transformations

06/19/2019
by   Guo-Jun Qi, et al.
0

Learning Transformation Equivariant Representations (TERs) seeks to capture the intrinsic visual structures of images through the representations that equivary to the applied transformations. It assumes that a transformation should be decoded from expressive representations of images before and after transformations. It greatly expands the scope of translation equivariance pinpointing the success of the Convolutional Neural Networks (CNNs) to develop a generic class of transformation equivariant representations. Unlike group equivariant convolutions that are limited to discrete transformations or linear transformation equivariance, we present a more flexible and tractable AutoEncoding Transformation (AET) model that can handle various types of transformations. Both deterministic AET and probabilistic Autoencoding Variational Transformations (AVT) models are presented. While the former trains transformation equivariant representations by directly reconstructing applied transformations, the latter is trained by maximizing the joint mutual information between the representations and the transformations. It leads to the Generalized TERs (GTERs) that could equivary against transformations in a more general manner by enabling them to capture more complex patterns of transformed visual structures beyond the linear TERs of a transformation group. We will further show that the presented approach can be extended to (semi-)supervised models by jointly maximizing the mutual information in the learned representations about the input labels and transformations. Experiment results following the standard evaluation protocols demonstrate the superior performances of the proposed models to the existing state-of-the-art unsupervised and (semi-)supervised approaches in literature.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset