Towards Streaming Egocentric Action Anticipation

by   Antonino Furnari, et al.

Egocentric action anticipation is the task of predicting the future actions a camera wearer will likely perform based on past video observations. While in a real-world system it is fundamental to output such predictions before the action begins, past works have not generally paid attention to model runtime during evaluation. Indeed, current evaluation schemes assume that predictions can be made offline, and hence that computational resources are not limited. In contrast, in this paper, we propose a “streaming” egocentric action anticipation evaluation protocol which explicitly considers model runtime for performance assessment, assuming that predictions will be available only after the current video segment is processed, which depends on the processing time of a method. Following the proposed evaluation scheme, we benchmark different state-of-the-art approaches for egocentric action anticipation on two popular datasets. Our analysis shows that models with a smaller runtime tend to outperform heavier models in the considered streaming scenario, thus changing the rankings generally observed in standard offline evaluations. Based on this observation, we propose a lightweight action anticipation model consisting in a simple feed-forward 3D CNN, which we propose to optimize using knowledge distillation techniques and a custom loss. The results show that the proposed approach outperforms prior art in the streaming scenario, also in combination with other lightweight models.


page 13

page 14

page 15

page 16

page 17

page 20

page 22

page 23


Streaming egocentric action anticipation: An evaluation scheme and approach

Egocentric action anticipation aims to predict the future actions the ca...

Streaming Video Temporal Action Segmentation In Real Time

Temporal action segmentation (TAS) is a critical step toward long-term v...

Untrimmed Action Anticipation

Egocentric action anticipation consists in predicting a future action th...

Temporally smooth online action detection using cycle-consistent future anticipation

Many video understanding tasks work in the offline setting by assuming t...

Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation

Transducer is one of the mainstream frameworks for streaming speech reco...

Inductive Attention for Video Action Anticipation

Anticipating future actions based on video observations is an important ...

Karma: Adaptive Video Streaming via Causal Sequence Modeling

Optimal adaptive bitrate (ABR) decision depends on a comprehensive chara...

Please sign up or login with your details

Forgot password? Click here to reset