AudioAR: Audio-Based Activity Recognition with Large-Scale Acoustic Embeddings from YouTube Videos

10/19/2018

∙

Activity sensing and recognition have been demonstrated to be critical in health care and smart home applications. Comparing to traditional methods such as using accelerometers or gyroscopes for activity recognition, acoustic-based methods can collect rich information of human activities together with the activity context, and therefore are more suitable for recognizing high-level compound activities. However, audio-based activity recognition in practice always suffers from the tedious and time-consuming process of collecting ground truth audio data from individual users. In this paper, we proposed a new mechanism of audio-based activity recognition that is entirely free from user training data by usage of millions of embedding features from general YouTube video sound clips. Based on combination of oversampling and deep learning approaches, our scheme does not require further feature extraction or outliers filtering for implementation. We developed our scheme for recognition of 15 common home-related activities and evaluated its performance under dedicated scenarios and in-the-wild scripted scenarios. In the dedicated recording test, our scheme yielded 81.1 activities. In the in-the-wild scripted tests, we obtained an averaged top-1 classification accuracy of 64.9 of 80.6 considerations including association between dataset labels and target activities, effects of segmentation size and privacy concerns were also discussed in the paper.

READ FULL TEXT

AudioAR: Audio-Based Activity Recognition with Large-Scale Acoustic Embeddings from YouTube Videos

Sign in with Google

Consider DeepAI Pro