Temporal Lift Pooling for Continuous Sign Language Recognition

07/18/2022
by   Liqing Gao, et al.
0

Pooling methods are necessities for modern neural networks for increasing receptive fields and lowering down computational costs. However, commonly used hand-crafted pooling approaches, e.g., max pooling and average pooling, may not well preserve discriminative features. While many researchers have elaborately designed various pooling variants in spatial domain to handle these limitations with much progress, the temporal aspect is rarely visited where directly applying hand-crafted methods or these specialized spatial variants may not be optimal. In this paper, we derive temporal lift pooling (TLP) from the Lifting Scheme in signal processing to intelligently downsample features of different temporal hierarchies. The Lifting Scheme factorizes input signals into various sub-bands with different frequency, which can be viewed as different temporal movement patterns. Our TLP is a three-stage procedure, which performs signal decomposition, component weighting and information fusion to generate a refined downsized feature map. We select a typical temporal task with long sequences, i.e. continuous sign language recognition (CSLR), as our testbed to verify the effectiveness of TLP. Experiments on two large-scale datasets show TLP outperforms hand-crafted methods and specialized spatial variants by a large margin (1.5 extractor, TLP exhibits great generalizability upon multiple backbones on various datasets and achieves new state-of-the-art results on two large-scale CSLR datasets. Visualizations further demonstrate the mechanism of TLP in correcting gloss borders. Code is released.

READ FULL TEXT
research
01/15/2013

Learnable Pooling Regions for Image Classification

Biologically inspired, from the early HMAX model to Spatial Pyramid Matc...
research
04/02/2021

LiftPool: Bidirectional ConvNet Pooling

Pooling is a critical operation in convolutional neural networks for inc...
research
03/06/2018

Masked Conditional Neural Networks for Audio Classification

We present the ConditionaL Neural Network (CLNN) and the Masked Conditio...
research
02/07/2018

Recognition of Acoustic Events Using Masked Conditional Neural Networks

Automatic feature extraction using neural networks has accomplished rema...
research
09/21/2023

SlowFast Network for Continuous Sign Language Recognition

The objective of this work is the effective extraction of spatial and dy...
research
02/08/2020

Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

Despite the recent success of deep learning in continuous sign language ...
research
01/06/2023

End-to-End 3D Dense Captioning with Vote2Cap-DETR

3D dense captioning aims to generate multiple captions localized with th...

Please sign up or login with your details

Forgot password? Click here to reset