Lightweight Delivery Detection on Doorbell Cameras

05/13/2023
by   Pirazh Khorramshahi, et al.
0

Despite recent advances in video-based action recognition and robust spatio-temporal modeling, most of the proposed approaches rely on the abundance of computational resources to afford running huge and computation-intensive convolutional or transformer-based neural networks to obtain satisfactory results. This limits the deployment of such models on edge devices with limited power and computing resources. In this work we investigate an important smart home application, video based delivery detection, and present a simple and lightweight pipeline for this task that can run on resource-constrained doorbell cameras. Our proposed pipeline relies on motion cues to generate a set of coarse activity proposals followed by their classification with a mobile-friendly 3DCNN network. For training we design a novel semi-supervised attention module that helps the network to learn robust spatio-temporal features and adopt an evidence-based optimization objective that allows for quantifying the uncertainty of predictions made by the network. Experimental results on our curated delivery dataset shows the significant effectiveness of our pipeline compared to alternatives and highlights the benefits of our training phase novelties to achieve free and considerable inference-time performance gains.

READ FULL TEXT

page 1

page 3

page 4

page 6

page 7

research
12/05/2021

STSM: Spatio-Temporal Shift Module for Efficient Action Recognition

The modeling, computational cost, and accuracy of traditional Spatio-tem...
research
06/17/2019

Spatio-Temporal Fusion Networks for Action Recognition

The video based CNN works have focused on effective ways to fuse appeara...
research
10/03/2021

Spatio-Temporal Video Representation Learning for AI Based Video Playback Style Prediction

Ever-increasing smartphone-generated video content demands intelligent t...
research
05/29/2018

Pointly-Supervised Action Localization

This paper strives for spatio-temporal localization of human actions in ...
research
10/14/2022

STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

In action recognition, although the combination of spatio-temporal video...
research
07/24/2022

MAR: Masked Autoencoders for Efficient Action Recognition

Standard approaches for video recognition usually operate on the full in...
research
02/05/2020

CONVINCE: Collaborative Cross-Camera Video Analytics at the Edge

Today, video cameras are deployed in dense for monitoring physical place...

Please sign up or login with your details

Forgot password? Click here to reset