Multimodal active speaker detection and virtual cinematography for video conferencing

02/10/2020
by   Ross Cutler, et al.
0

Active speaker detection (ASD) and virtual cinematography (VC) can significantly improve the remote user experience of a video conference by automatically panning, tilting and zooming of a video conferencing camera: users subjectively rate an expert video cinematographer's video significantly higher than unedited video. We describe a new automated ASD and VC that performs within 0.3 MOS of an expert cinematographer based on subjective ratings with a 1-5 scale. This system uses a 4K wide-FOV camera, a depth camera, and a microphone array; it extracts features from each modality and trains an ASD using an AdaBoost machine learning system that is very efficient and runs in real-time. A VC is similarly trained using machine learning to optimize the subjective quality of the overall experience. To avoid distracting the room participants and reduce switching latency the system has no moving parts – the VC works by cropping and zooming the 4K wide-FOV video stream. The system was tuned and evaluated using extensive crowdsourcing techniques and evaluated on a dataset with N=100 meetings, each 2-5 minutes in length.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2018

Machine learning architectures to predict motion sickness using a Virtual Reality rollercoaster simulation tool

Virtual Reality (VR) can cause an unprecedented immersion and feeling of...
research
03/01/2017

Making 360^∘ Video Watchable in 2D: Learning Videography for Click Free Viewing

360^∘ video requires human viewers to actively control "where" to look w...
research
09/15/2023

A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism

We introduce a distinctive real-time, causal, neural network-based activ...
research
03/29/2016

Cross-modal Supervision for Learning Active Speaker Detection in Video

In this paper, we show how to use audio to supervise the learning of act...
research
08/11/2020

Content Format and Quality of Experience in Virtual Reality

In this paper, we investigate three forms of virtual reality content pro...
research
11/19/2018

Quantifying Human Behavior on the Block Design Test Through Automated Multi-Level Analysis of Overhead Video

The block design test is a standardized, widely used neuropsychological ...
research
10/30/2017

Prediction of Satisfied User Ratio for Compressed Video

A large-scale video quality dataset called the VideoSet has been constru...

Please sign up or login with your details

Forgot password? Click here to reset