We present RECAP (REtrieval-Augmented Audio CAPtioning), a novel and
eff...
We present AdVerb, a novel audio-visual dereverberation framework that u...
Neural image classifiers can often learn to make predictions by overly
r...
Complex Named Entity Recognition (NER) is the task of detecting
linguist...
Biomedical Named Entity Recognition (BioNER) is the fundamental task of
...
In this paper, we introduce UnFuSeD, a novel approach to leverage
self-s...
The tremendous growth of social media users interacting in online
conver...
Disfluency, though originating from human spoken utterances, is primaril...
We present a new Self-Supervised Learning (SSL) approach to pre-train
en...
We present Multiscale Audio Spectrogram Transformer (MAST) for audio
cla...
In this paper, we propose a new Self-Supervised Learning (SSL) algorithm...
While Self-Supervised Learning has helped reap the benefit of the scale ...
Self-supervised learning (SSL) to learn high-level speech representation...
While self-supervised speech representation learning (SSL) models serve ...
The expression of emotions is a crucial part of daily human communicatio...
Emotion Recognition (ER) aims to classify human utterances into differen...
Existing approaches in disfluency detection focus on solving a token-lev...
Inspired by the recent progress in self-supervised learning for computer...
In the current era of the internet, where social media platforms are eas...
We introduce DECAR, a self-supervised pre-training approach for learning...
Toxic speech, also known as hate speech, is regarded as one of the cruci...
Social network platforms are generally used to share positive, construct...
This paper describes our proposed system for the AAAI-CAD21 shared task:...
Named entity recognition (NER) from text has been a widely studied probl...