The Multimodal Video Search by Examples (MVSE) project investigates usin...
Sound event detection (SED), as a core module of acoustic environmental
...
Although the lower layers of a deep neural network learn features which ...
Self-attention models such as Transformers, which can capture temporal
r...
Recently, Transformers have shown competitive automatic speech recogniti...
Recently, self-attention models such as Transformers have given competit...
Raw waveform acoustic modelling has recently gained interest due to neur...