BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion

06/05/2023
by   Ahana Deb, et al.
0

Spoken languages often utilise intonation, rhythm, intensity, and structure, to communicate intention, which can be interpreted differently depending on the rhythm of speech of their utterance. These speech acts provide the foundation of communication and are unique in expression to the language. Recent advancements in attention-based models, demonstrating their ability to learn powerful representations from multilingual datasets, have performed well in speech tasks and are ideal to model specific tasks in low resource languages. Here, we develop a novel multimodal approach combining two models, wav2vec2.0 for audio and MarianMT for text translation, by using multimodal attention fusion to predict speech acts in our prepared Bengali speech corpus. We also show that our model BeAts (Bengali speech acts recognition using Multimodal Attention Fusion) significantly outperforms both the unimodal baseline using only speech data and a simpler bimodal fusion using both speech and text data. Project page: https://soumitri2001.github.io/BeAts

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2023

AudioPaLM: A Large Language Model That Can Speak and Listen

We introduce AudioPaLM, a large language model for speech understanding ...
research
08/22/2023

SeamlessM4T-Massively Multilingual Multimodal Machine Translation

What does it take to create the Babel Fish, a tool that can help individ...
research
05/20/2019

Target Based Speech Act Classification in Political Campaign Text

We study pragmatics in political campaign text, through analysis of spee...
research
06/12/2023

Language of Bargaining

Leveraging an established exercise in negotiation education, we build a ...
research
11/27/2022

A novel multimodal dynamic fusion network for disfluency detection in spoken utterances

Disfluency, though originating from human spoken utterances, is primaril...
research
11/07/2020

Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages

With recent advancements in language technologies, humansare now interac...
research
07/17/2018

Low-Resource Contextual Topic Identification on Speech

In topic identification (topic ID) on real-world unstructured audio, an ...

Please sign up or login with your details

Forgot password? Click here to reset