
-
Look who's not talking
The objective of this work is speaker diarisation of speech recordings '...
read it
-
Supervised attention for speaker recognition
The recently proposed self-attentive pooling (SAP) has shown good perfor...
read it
-
The ins and outs of speaker recognition: lessons from VoxSRC 2020
The VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020 ...
read it
-
Graph Attention Networks for Speaker Verification
This work presents a novel back-end framework for speaker verification u...
read it
-
Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020
This report describes our submission to the VoxCeleb Speaker Recognition...
read it
-
Cross attentive pooling for speaker verification
The goal of this paper is text-independent speaker verification where ut...
read it
-
Self-Supervised Learning of Audio-Visual Objects from Video
Our objective is to transform a video into a set of discrete audio-visua...
read it
-
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues
Recent progress in fine-grained gesture and action classification, and m...
read it
-
Augmentation adversarial training for unsupervised speaker recognition
The goal of this work is to train robust speaker recognition models with...
read it
-
Spot the conversation: speaker diarisation in the wild
The goal of this paper is speaker diarisation of videos collected 'in th...
read it
-
Metric Learning for Keyword Spotting
The goal of this work is to train effective representations for keyword ...
read it
-
FaceFilter: Audio-visual speech separation using still images
The objective of this paper is to separate a target speaker's speech fro...
read it
-
Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision
The goal of this work is to train discriminative cross-modal embeddings ...
read it
-
In defence of metric learning for speaker recognition
The objective of this paper is 'open-set' speaker recognition of unseen ...
read it
-
Disentangled Speech Embeddings using Cross-modal Self-supervision
The objective of this paper is to learn representations of speaker ident...
read it
-
VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge
The VoxCeleb Speaker Recognition Challenge 2019 aimed to assess how well...
read it
-
ASR is all you need: cross-modal distillation for lip reading
The goal of this work is to train strong models for visual speech recogn...
read it
-
The sound of my voice: speaker representation loss for target voice separation
Research on content and style representations has been widely studied in...
read it
-
Delving into VoxCeleb: environment invariant speaker recognition
Research in speaker recognition has recently seen significant progress d...
read it
-
My lips are concealed: Audio-visual speech enhancement through obstructions
Our objective is an audio-visual model for separating a single speaker f...
read it
-
Naver at ActivityNet Challenge 2019 -- Task B Active Speaker Detection (AVA)
This report describes our submission to the ActivityNet Challenge at CVP...
read it
-
Who said that?: Audio-visual speaker diarisation of real-world meetings
The goal of this work is to determine 'who spoke when' in real-world mee...
read it
-
Utterance-level Aggregation For Speaker Recognition In The Wild
The objective of this paper is speaker recognition "in the wild"-where u...
read it
-
Perfect match: Improved cross-modal embeddings for audio-visual synchronisation
This paper proposes a new strategy for learning powerful cross-modal emb...
read it
-
Deep Audio-Visual Speech Recognition
The goal of this work is to recognise phrases and sentences being spoken...
read it
-
LRS3-TED: a large-scale dataset for visual speech recognition
This paper introduces a new multi-modal dataset for visual and audio-vis...
read it
-
Deep Lip Reading: a comparison of models and an online application
The goal of this paper is to develop state-of-the-art models for lip rea...
read it
-
VoxCeleb2: Deep Speaker Recognition
The objective of this paper is speaker recognition under noisy and uncon...
read it
-
The Conversation: Deep Audio-Visual Speech Enhancement
Our goal is to isolate individual speakers from multi-talker simultaneou...
read it
-
VoxCeleb: a large-scale speaker identification dataset
Most existing datasets for speaker identification contain samples obtain...
read it
-
You said that?
We present a method for generating a video of a talking face. The method...
read it
-
Lip Reading Sentences in the Wild
The goal of this work is to recognise phrases and sentences being spoken...
read it
-
Signs in time: Encoding human motion as a temporal image
The goal of this work is to recognise and localise short temporal signal...
read it