Multi-label classification of open-ended questions with BERT

04/06/2023
by   Matthias Schonlau, et al.
0

Open-ended questions in surveys are valuable because they do not constrain the respondent's answer, thereby avoiding biases. However, answers to open-ended questions are text data which are harder to analyze. Traditionally, answers were manually classified as specified in the coding manual. Most of the effort to automate coding has gone into the easier problem of single label prediction, where answers are classified into a single code. However, open-ends that require multi-label classification, i.e., that are assigned multiple codes, occur frequently. This paper focuses on multi-label classification of text answers to open-ended survey questions in social science surveys. We evaluate the performance of the transformer-based architecture BERT for the German language in comparison to traditional multi-label algorithms (Binary Relevance, Label Powerset, ECC) in a German social science survey, the GLES Panel (N=17,584, 55 labels). We find that classification with BERT (forcing at least one label) has the smallest 0/1 loss (13.1 (18.9 that correspond to a single label (7.1 multiple labels (∼50 1.5 does not lower the 0/1 loss by much. Our work has important implications for social scientists: 1) We have shown multi-label classification with BERT works in the German language for open-ends. 2) For mildly multi-label classification tasks, the loss now appears small enough to allow for fully automatic classification (as compared to semi-automatic approaches). 3) Multi-label classification with BERT requires only a single model. The leading competitor, ECC, iterates through individual single label predictions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2021

Multi-label Ranking: Mining Multi-label and Label Ranking Data

We survey multi-label ranking tasks, specifically multi-label classifica...
research
04/22/2013

Multi-Label Classifier Chains for Bird Sound

Bird sound data collected with unattended microphones for automatic surv...
research
10/10/2019

Multi-label Categorization of Accounts of Sexism using a Neural Framework

Sexism, an injustice that subjects women and girls to enormous suffering...
research
04/19/2019

Reliable Multi-label Classification: Prediction with Partial Abstention

In contrast to conventional (single-label) classification, the setting o...
research
03/02/2023

Adopting the Multi-answer Questioning Task with an Auxiliary Metric for Extreme Multi-label Text Classification Utilizing the Label Hierarchy

Extreme multi-label text classification utilizes the label hierarchy to ...
research
05/15/2023

sustain.AI: a Recommender System to analyze Sustainability Reports

We present sustain.AI, an intelligent, context-aware recommender system ...
research
05/21/2023

F-PABEE: Flexible-patience-based Early Exiting for Single-label and Multi-label text Classification Tasks

Computational complexity and overthinking problems have become the bottl...

Please sign up or login with your details

Forgot password? Click here to reset