STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization

06/18/2023
by   Kyle Min, et al.
0

This report introduces our novel method named STHG for the Audio-Visual Diarization task of the Ego4D Challenge 2023. Our key innovation is that we model all the speakers in a video using a single, unified heterogeneous graph learning framework. Unlike previous approaches that require a separate component solely for the camera wearer, STHG can jointly detect the speech activities of all people including the camera wearer. Our final method obtains 61.1 baselines as well as last year's winner. Our submission achieved 1st place in the Ego4D Challenge 2023. We additionally demonstrate that applying the off-the-shelf speech recognition system to the diarized speech segments by STHG produces a competitive performance on the Speech Transcription task of this challenge.

READ FULL TEXT

page 1

page 2

research
10/14/2022

Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization

This report describes our approach for the Audio-Visual Diarization (AVD...
research
03/11/2023

The Multimodal Information based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition

The Multi-modal Information based Speech Processing (MISP) challenge aim...
research
08/11/2023

Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping

Visual Speech Recognition (VSR) differs from the common perception tasks...
research
09/15/2023

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

Previous Multimodal Information based Speech Processing (MISP) challenge...
research
11/13/2020

The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines

Automatic speech recognition (ASR) has been significantly advanced with ...
research
06/15/2023

Team AcieLee: Technical Report for EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023

In this report, we describe the technical details of our submission to t...
research
11/01/2018

Introduction to the 1st Place Winning Model of OpenImages Relationship Detection Challenge

This article describes the model we built that achieved 1st place in the...

Please sign up or login with your details

Forgot password? Click here to reset