Exploiting Temporal Information for DCNN-based Fine-Grained Object Classification

08/01/2016
by   ZongYuan Ge, et al.
0

Fine-grained classification is a relatively new field that has concentrated on using information from a single image, while ignoring the enormous potential of using video data to improve classification. In this work we present the novel task of video-based fine-grained object classification, propose a corresponding new video dataset, and perform a systematic study of several recent deep convolutional neural network (DCNN) based approaches, which we specifically adapt to the task. We evaluate three-dimensional DCNNs, two-stream DCNNs, and bilinear DCNNs. Two forms of the two-stream approach are used, where spatial and temporal data from two independent DCNNs are fused either via early fusion (combination of the fully-connected layers) and late fusion (concatenation of the softmax outputs of the DCNNs). For bilinear DCNNs, information from the convolutional layers of the spatial and temporal DCNNs is combined via local co-occurrences. We then fuse the bilinear DCNN and early fusion of the two-stream approach to combine the spatial and temporal information at the local and global level (Spatio-Temporal Co-occurrence). Using the new and challenging video dataset of birds, classification performance is improved from 23.1 the Spatio-Temporal Co-occurrence system. Incorporating automatically detected bounding box location further improves the classification accuracy to 53.6

READ FULL TEXT

page 3

page 4

research
11/21/2018

Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection

Fine-grained action detection is an important task with numerous applica...
research
06/27/2022

Explicitly incorporating spatial information to recurrent networks for agriculture

In agriculture, the majority of vision systems perform still image class...
research
06/17/2019

Spatio-Temporal Fusion Networks for Action Recognition

The video based CNN works have focused on effective ways to fuse appeara...
research
06/12/2019

Presence-Only Geographical Priors for Fine-Grained Image Classification

Appearance information alone is often not sufficient to accurately diffe...
research
12/08/2021

STAF: A Spatio-Temporal Attention Fusion Network for Few-shot Video Classification

We propose STAF, a Spatio-Temporal Attention Fusion network for few-shot...
research
07/01/2022

Motion Compensated Frequency Selective Extrapolation for Error Concealment in Video Coding

Although wireless and IP-based access to video content gives a new degre...
research
07/22/2018

Correlation Net : spatio temporal multimodal deep learning

This paper describes a network that is able to capture spatiotemporal co...

Please sign up or login with your details

Forgot password? Click here to reset