Multi-modal Egocentric Activity Recognition using Audio-Visual Features

07/02/2018
by   Mehmet Ali Arabaci, et al.
2

Egocentric activity recognition in first-person videos has an increasing importance with a variety of applications such as lifelogging, summarization, assisted-living and activity tracking. Existing methods for this task are based on interpretation of various sensor information using pre-determined weights for each feature. In this work, we propose a new framework for egocentric activity recognition problem based on combining audio-visual features with multi-kernel learning (MKL) and multi-kernel boosting (MKBoost). For that purpose, firstly grid optical-flow, virtual-inertia feature, log-covariance, cuboid are extracted from the video. The audio signal is characterized using a "supervector", obtained based on Gaussian mixture modelling of frame-level features, followed by a maximum a-posteriori adaptation. Then, the extracted multi-modal features are adaptively fused by MKL classifiers in which both the feature and kernel selection/weighing and recognition tasks are performed together. The proposed framework was evaluated on a number of egocentric datasets. The results showed that using multi-modal features with MKL outperforms the existing methods.

READ FULL TEXT

page 1

page 6

page 7

page 8

page 9

page 10

research
02/22/2017

Boosted Multiple Kernel Learning for First-Person Activity Recognition

Activity recognition from first-person (ego-centric) videos has recently...
research
01/26/2023

Towards Continual Egocentric Activity Recognition: A Multi-modal Egocentric Activity Dataset for Continual Learning

With the rapid development of wearable cameras, a massive collection of ...
research
06/12/2018

Qiniu Submission to ActivityNet Challenge 2018

In this paper, we introduce our submissions for the tasks of trimmed act...
research
02/07/2020

M^3T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild

This report describes a multi-modal multi-task (M^3T) approach underlyin...
research
10/27/2016

Exploiting Structure Sparsity for Covariance-based Visual Representation

The past few years have witnessed increasing research interest on covari...
research
03/27/2022

Audio-Adaptive Activity Recognition Across Video Domains

This paper strives for activity recognition under domain shift, for exam...
research
08/19/2013

Seeing What You're Told: Sentence-Guided Activity Recognition In Video

We present a system that demonstrates how the compositional structure of...

Please sign up or login with your details

Forgot password? Click here to reset