TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

08/16/2022
by   Lorenz Stangier, et al.
0

Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate problems, causes, and solutions that occur in work-related chats. TexPrax uses a chatbot to directly engage the employees to provide lightweight annotations on their conversation and ease their documentation work. To comply with data privacy and security regulations, we use an end-to-end message encryption and give our users full control over their data which has various advantages over conventional annotation tools. We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work. Overall, we collect 201 task-oriented German dialogues containing 1,027 sentences with sentence-level expert annotations. Our data analysis also reveals that real-world conversations frequently contain instances with code-switching, varying abbreviations for the same entity, and dialects which NLP systems should be able to handle.

READ FULL TEXT
research
06/04/2021

Annotation Curricula to Implicitly Train Non-Expert Annotators

Annotation studies often require annotators to familiarize themselves wi...
research
10/05/2020

Effects of Naturalistic Variation in Goal-Oriented Dialog

Existing benchmarks used to evaluate the performance of end-to-end neura...
research
05/24/2016

Learning End-to-End Goal-Oriented Dialog

Traditional dialog systems used in goal-oriented applications require a ...
research
02/07/2018

SlideRunner - A Tool for Massive Cell Annotations in Whole Slide Images

Large-scale image data such as digital whole-slide histology images pose...
research
06/01/2021

HERALD: An Annotation Efficient Method to Detect User Disengagement in Social Conversations

Open-domain dialog systems have a user-centric goal: to provide humans w...
research
05/02/2022

TuGeBiC: A Turkish German Bilingual Code-Switching Corpus

In this paper we describe the process of collection, transcription, and ...
research
04/30/2020

Unsupervised Learning of KB Queries in Task Oriented Dialogs

Task-oriented dialog (TOD) systems converse with users to accomplish a s...

Please sign up or login with your details

Forgot password? Click here to reset