Privacy-Aware Crowd Labelling for Machine Learning Tasks

02/03/2022
by   Giannis Haralabopoulos, et al.
0

The extensive use of online social media has highlighted the importance of privacy in the digital space. As more scientists analyse the data created in these platforms, privacy concerns have extended to data usage within the academia. Although text analysis is a well documented topic in academic literature with a multitude of applications, ensuring privacy of user-generated content has been overlooked. Most sentiment analysis methods require emotion labels, which can be obtained through crowdsourcing, where non-expert individuals contribute to scientific tasks. The text itself has to be exposed to third parties in order to be labelled. In an effort to reduce the exposure of online users' information, we propose a privacy preserving text labelling method for varying applications, based in crowdsourcing. We transform text with different levels of privacy, and analyse the effectiveness of the transformation with regards to label correlation and consistency. Our results suggest that privacy can be implemented in labelling, retaining the annotational diversity and subjectivity of traditional labelling.

READ FULL TEXT
research
03/06/2023

Crowdsourcing on Sensitive Data with Privacy-Preserving Text Rewriting

Most tasks in NLP require labeled data. Data labeling is often done on c...
research
11/05/2022

A Comparison of Automatic Labelling Approaches for Sentiment Analysis

Labelling a large quantity of social media data for the task of supervis...
research
08/30/2018

VirtualIdentity: Privacy-Preserving User Profiling

User profiling from user generated content (UGC) is a common practice th...
research
07/06/2019

I Am Not What I Write: Privacy Preserving Text Representation Learning

Online users generate tremendous amounts of textual information by parti...
research
06/15/2021

Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content

In this paper we present a benchmark dataset generated as part of a proj...
research
12/03/2020

Privacy Labelling and the Story of Princess Privacy and the Seven Helpers

Privacy is currently in 'distress' and in need of 'rescue', much like pr...
research
10/11/2021

Privacy preserving local analysis of digital trace data: A proof-of-concept

We present PORT, a software platform for local data extraction and analy...

Please sign up or login with your details

Forgot password? Click here to reset