Multi-Perspective LSTM for Joint Visual Representation Learning

05/06/2021
by   Alireza Sepas-Moghaddam, et al.
0

We present a novel LSTM cell architecture capable of learning both intra- and inter-perspective relationships available in visual sequences captured from multiple perspectives. Our architecture adopts a novel recurrent joint learning strategy that uses additional gates and memories at the cell level. We demonstrate that by using the proposed cell to create a network, more effective and richer visual representations are learned for recognition tasks. We validate the performance of our proposed architecture in the context of two multi-perspective visual recognition tasks namely lip reading and face recognition. Three relevant datasets are considered and the results are compared against fusion strategies, other existing multi-input LSTM architectures, and alternative recognition solutions. The experiments show the superior performance of our solution over the considered benchmarks, both in terms of recognition accuracy and complexity. We make our code publicly available at https://github.com/arsm/MPLSTM.

READ FULL TEXT
research
05/11/2019

Novel Long Short-Term Memory Cell Architectures: Application to Light Field Face Recognition

With the emergence of lenslet light field cameras able to capture rich s...
research
03/30/2021

Distribution Alignment: A Unified Framework for Long-tail Visual Recognition

Despite the recent success of deep neural networks, it remains challengi...
research
10/28/2021

FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

SARS-CoV-2 has presented direct and indirect challenges to the scientifi...
research
01/10/2021

CapsField: Light Field-based Face and Expression Recognition in the Wild using Capsule Routing

Light field (LF) cameras provide rich spatio-angular visual representati...
research
12/24/2021

Self-Gated Memory Recurrent Network for Efficient Scalable HDR Deghosting

We propose a novel recurrent network-based HDR deghosting method for fus...
research
07/15/2021

Temporal-aware Language Representation Learning From Crowdsourced Labels

Learning effective language representations from crowdsourced labels is ...
research
06/22/2014

Factors of Transferability for a Generic ConvNet Representation

Evidence is mounting that Convolutional Networks (ConvNets) are the most...

Please sign up or login with your details

Forgot password? Click here to reset