Automated PII Extraction from Social Media for Raising Privacy Awareness: A Deep Transfer Learning Approach

11/11/2021
by   Yizhi Liu, et al.
0

Internet users have been exposing an increasing amount of Personally Identifiable Information (PII) on social media. Such exposed PII can cause severe losses to the users, and informing users of their PII exposure is crucial to raise their privacy awareness and encourage them to take protective measures. To this end, advanced automatic techniques are needed. While Information Extraction (IE) techniques can be used to extract the PII automatically, Deep Learning (DL)-based IE models alleviate the need for feature engineering and further improve the efficiency. However, DL-based IE models often require large-scale labeled data for training, but PII-labeled social media posts are difficult to obtain due to privacy concerns. Also, these models rely heavily on pre-trained word embeddings, while PII in social media often varies in forms and thus has no fixed representations in pre-trained word embeddings. In this study, we propose the Deep Transfer Learning for PII Extraction (DTL-PIIE) framework to address these two limitations. DTL-PIIE transfers knowledge learned from publicly available PII data to social media to address the problem of rare PII-labeled data. Moreover, our framework leverages Graph Convolutional Networks (GCNs) to incorporate syntactic patterns to guide PIIE without relying on pre-trained word embeddings. Evaluation against benchmark IE models indicates that our approach outperforms state-of-the-art DL-based IE models. Our framework can facilitate various applications, such as PII misuse prediction and privacy risk assessment, protecting the privacy of internet users.

READ FULL TEXT
research
04/24/2019

Integrating Social Media into a Pan-European Flood Awareness System: A Multilingual Approach

This paper describes a prototype system that integrates social media ana...
research
01/19/2018

Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms

Harassment by cyberbullies is a significant phenomenon on the social med...
research
10/27/2016

Word Embeddings to Enhance Twitter Gang Member Profile Identification

Gang affiliates have joined the masses who use social media to share tho...
research
11/15/2022

SexWEs: Domain-Aware Word Embeddings via Cross-lingual Semantic Specialisation for Chinese Sexism Detection in Social Media

The goal of sexism detection is to mitigate negative online content targ...
research
07/14/2023

A Topical Approach to Capturing Customer Insight In Social Media

The age of social media has opened new opportunities for businesses. Thi...
research
07/03/2020

Depression Detection with Multi-Modalities Using a Hybrid Deep Learning Model on Social Media

Social networks enable people to interact with one another by sharing in...
research
01/10/2019

Automatic detection of passable roads after floods in remote sensed and social media data

This paper addresses the problem of floods classification and floods aft...

Please sign up or login with your details

Forgot password? Click here to reset