Predicting gender and age categories in English conversations using lexical, non-lexical, and turn-taking features

02/26/2021
by   Andreas Liesenfeld, et al.
0

This paper examines gender and age salience and (stereo)typicality in British English talk with the aim to predict gender and age categories based on lexical, phrasal and turn-taking features. We examine the SpokenBNC, a corpus of around 11.4 million words of British English conversations and identify behavioural differences between speakers that are labelled for gender and age categories. We explore differences in language use and turn-taking dynamics and identify a range of characteristics that set the categories apart. We find that female speakers tend to produce more and slightly longer turns, while turns by male speakers feature a higher type-token ratio and a distinct range of minimal particles such as "eh", "uh" and "em". Across age groups, we observe, for instance, that swear words and laughter characterize young speakers' talk, while old speakers tend to produce more truncated words. We then use the observed characteristics to predict gender and age labels of speakers per conversation and per turn as a classification task, showing that non-lexical utterances such as minimal particles that are usually left out of dialog data can contribute to setting the categories apart.

READ FULL TEXT
research
09/15/2020

Pardon the Interruption: An Analysis of Gender and Turn-Taking in U.S. Supreme Court Oral Arguments

This study presents a corpus of turn changes between speakers in U.S. Su...
research
03/11/2018

Path of Vowel Raising in Chengdu Dialect of Mandarin

He and Rao (2013) reported a raising phenomenon of /a/ in /Xan/ (X being...
research
10/31/2020

Pick a Fight or Bite your Tongue: Investigation of Gender Differences in Idiomatic Language Usage

A large body of research on gender-linked language has established found...
research
04/25/2019

Look Who's Talking: Inferring Speaker Attributes from Personal Longitudinal Dialog

We examine a large dialog corpus obtained from the conversation history ...
research
08/23/2022

Don't Take it Personally: Analyzing Gender and Age Differences in Ratings of Online Humor

Computational humor detection systems rarely model the subjectivity of h...
research
05/29/2018

Entrainment profiles: Comparison by gender, role, and feature set

We examine prosodic entrainment in cooperative game dialogs for new feat...
research
05/25/2022

Empathic Conversations: A Multi-level Dataset of Contextualized Conversations

Empathy is a cognitive and emotional reaction to an observed situation o...

Please sign up or login with your details

Forgot password? Click here to reset