SexWEs: Domain-Aware Word Embeddings via Cross-lingual Semantic Specialisation for Chinese Sexism Detection in Social Media

11/15/2022
by   Aiqi Jiang, et al.
0

The goal of sexism detection is to mitigate negative online content targeting certain gender groups of people. However, the limited availability of labeled sexism-related datasets makes it problematic to identify online sexism for low-resource languages. In this paper, we address the task of automatic sexism detection in social media for one low-resource language – Chinese. Rather than collecting new sexism data or building cross-lingual transfer learning models, we develop a cross-lingual domain-aware semantic specialisation system in order to make the most of existing data. Semantic specialisation is a technique for retrofitting pre-trained distributional word vectors by integrating external linguistic knowledge (such as lexico-semantic relations) into the specialised feature space. To do this, we leverage semantic resources for sexism from a high-resource language (English) to specialise pre-trained word vectors in the target language (Chinese) to inject domain knowledge. We demonstrate the benefit of our sexist word embeddings (SexWEs) specialised by our framework via intrinsic evaluation of word similarity and extrinsic evaluation of sexism detection. Compared with other specialisation approaches and Chinese baseline word vectors, our SexWEs shows an average score improvement of 0.033 and 0.064 in both intrinsic and extrinsic evaluations, respectively. The ablative results and visualisation of SexWEs also prove the effectiveness of our framework on retrofitting word vectors in low-resource languages. Our code and sexism-related word vectors will be publicly available.

READ FULL TEXT
research
01/15/2022

Addressing the Challenges of Cross-Lingual Hate Speech Detection

The goal of hate speech detection is to filter negative online content a...
research
03/30/2018

Robust Cross-lingual Hypernymy Detection using Dependency Context

Cross-lingual Hypernymy Detection involves determining if a word in one ...
research
09/08/2021

Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

The widespread presence of offensive language on social media motivated ...
research
09/05/2022

Cross-Lingual and Cross-Domain Crisis Classification for Low-Resource Scenarios

Social media data has emerged as a useful source of timely information a...
research
09/11/2018

Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization

Semantic specialization is the process of fine-tuning pre-trained distri...
research
11/11/2021

Automated PII Extraction from Social Media for Raising Privacy Awareness: A Deep Transfer Learning Approach

Internet users have been exposing an increasing amount of Personally Ide...
research
03/28/2022

Isomorphic Cross-lingual Embeddings for Low-Resource Languages

Cross-Lingual Word Embeddings (CLWEs) are a key component to transfer li...

Please sign up or login with your details

Forgot password? Click here to reset