We propose a highly controllable voice manipulation system that can perf...
There has been significant research on developing pretrained transformer...
Metrics to evaluate audio captions simply provide a score without much
e...
While many recent any-to-any voice conversion models succeed in transfer...
Advancement in large pretrained language models has significantly improv...
Model architectures such as wav2vec 2.0 and HuBERT have been proposed to...
Audio captioning quality metrics which are typically borrowed from the
m...
In the era of loT (Internet of Things) we are surrounded by a plethora o...
Self-supervised learning methods such as wav2vec 2.0 have shown promisin...
This paper presents a new learning strategy for the Sound Event Detectio...