Automated classification for open-ended questions with BERT

09/13/2022
by   Hyukjun Gweon, et al.
0

Manual coding of text data from open-ended questions into different categories is time consuming and expensive. Automated coding uses statistical/machine learning to train on a small subset of manually coded text answers. Recently, pre-training a general language model on vast amounts of unrelated data and then adapting the model to the specific application has proven effective in natural language processing. Using two data sets, we empirically investigate whether BERT, the currently dominant pre-trained language model, is more effective at automated coding of answers to open-ended questions than other non-pre-trained statistical learning approaches. We found fine-tuning the pre-trained BERT parameters is essential as otherwise BERT's is not competitive. Second, we found fine-tuned BERT barely beats the non-pre-trained statistical learning approaches in terms of classification accuracy when trained on 100 manually coded observations. However, BERT's relative advantage increases rapidly when more manually coded observations (e.g. 200-400) are available for training. We conclude that for automatically coding answers to open-ended questions BERT is preferable to non-pretrained models such as support vector machines and boosting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2023

Spanish Pre-trained BERT Model and Evaluation Data

The Spanish language is one of the top 5 spoken languages in the world. ...
research
03/28/2019

Building Automated Survey Coders via Interactive Machine Learning

Software systems trained via machine learning to automatically classify ...
research
06/03/2021

Auto-tagging of Short Conversational Sentences using Transformer Methods

The problem of categorizing short speech sentences according to their se...
research
07/06/2022

Coding Reliability with Aclus – Did I correctly characterize my observations?

Describing observations or objects in non-mathematical disciplines can o...
research
10/09/2022

Spread Love Not Hate: Undermining the Importance of Hateful Pre-training for Hate Speech Detection

Pre-training large neural language models, such as BERT, has led to impr...
research
06/03/2021

BERT meets LIWC: Exploring State-of-the-Art Language Models for Predicting Communication Behavior in Couples' Conflict Interactions

Many processes in psychology are complex, such as dyadic interactions be...
research
05/09/2021

Improving Patent Mining and Relevance Classification using Transformers

Patent analysis and mining are time-consuming and costly processes for c...

Please sign up or login with your details

Forgot password? Click here to reset