Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning
This paper presents an approach for recognition of human activities from extreme low resolution (e.g., 16x12) videos. Extreme low resolution recognition is not only necessary for analyzing actions at a distance but also is crucial for enabling privacy-preserving recognition of human activities. We propose a new approach to learn an embedding (i.e., representation) optimized for low resolution (LR) videos by taking advantage of their inherent property: two images originated from the exact same scene often have totally different pixel (i.e., RGB) values dependent on their LR transformations. We designed a new two-stream multi-Siamese convolutional neural network that learns the embedding space to be shared by low resolution videos created with different LR transforms, thereby enabling learning of transform-robust activity classifiers. We experimentally confirm that our approach of jointly learning the optimal LR video representation and the classifier outperforms the previous state-of-the-art low resolution recognition approaches on two public standard datasets by a meaningful margin.
READ FULL TEXT