ITALIC: An Italian Intent Classification Dataset

06/14/2023
by   Alkis Koudounas, et al.
0

Recent large-scale Spoken Language Understanding datasets focus predominantly on English and do not account for language-specific phenomena such as particular phonemes or words in different lects. We introduce ITALIC, the first large-scale speech dataset designed for intent classification in Italian. The dataset comprises 16,521 crowdsourced audio samples recorded by 70 speakers from various Italian regions and annotated with intent labels and additional metadata. We explore the versatility of ITALIC by evaluating current state-of-the-art speech and text models. Results on intent classification suggest that increasing scale and running language adaptation yield better speech models, monolingual text models outscore multilingual ones, and that speech recognition on ITALIC is more challenging than on existing Italian benchmarks. We release both the dataset and the annotation scheme to streamline the development of new Italian SLU models and language-specific datasets.

READ FULL TEXT
research
12/26/2022

Skit-S2I: An Indian Accented Speech to Intent dataset

Conventional conversation assistants extract text transcripts from the s...
research
12/03/2019

Fast Intent Classification for Spoken Language Understanding

Spoken Language Understanding (SLU) systems consist of several machine l...
research
02/26/2022

Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

The lack of speech data annotated with labels required for spoken langua...
research
05/16/2023

The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

End-to-end spoken language understanding (SLU) remains elusive even with...
research
06/29/2022

Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models

In Spoken Language Understanding (SLU) the task is to extract important ...
research
06/20/2018

Multi-Layer Ensembling Techniques for Multilingual Intent Classification

In this paper we determine how multi-layer ensembling improves performan...
research
12/26/2017

Actionable Email Intent Modeling with Reparametrized RNNs

Emails in the workplace are often intentional calls to action for its re...

Please sign up or login with your details

Forgot password? Click here to reset