For the Underrepresented in Gender Bias Research: Chinese Name Gender Prediction with Heterogeneous Graph Attention Network

02/01/2023
by   Zihao Pan, et al.
0

Achieving gender equality is an important pillar for humankind's sustainable future. Pioneering data-driven gender bias research is based on large-scale public records such as scientific papers, patents, and company registrations, covering female researchers, inventors and entrepreneurs, and so on. Since gender information is often missing in relevant datasets, studies rely on tools to infer genders from names. However, available open-sourced Chinese gender-guessing tools are not yet suitable for scientific purposes, which may be partially responsible for female Chinese being underrepresented in mainstream gender bias research and affect their universality. Specifically, these tools focus on character-level information while overlooking the fact that the combinations of Chinese characters in multi-character names, as well as the components and pronunciations of characters, convey important messages. As a first effort, we design a Chinese Heterogeneous Graph Attention (CHGAT) model to capture the heterogeneity in component relationships and incorporate the pronunciations of characters. Our model largely surpasses current tools and also outperforms the state-of-the-art algorithm. Last but not least, the most popular Chinese name-gender dataset is single-character based with far less female coverage from an unreliable source, naturally hindering relevant studies. We open-source a more balanced multi-character dataset from an official source together with our code, hoping to help future research promoting gender equality.

READ FULL TEXT
research
06/13/2019

Advance gender prediction tool of first names and its use in analysing gender disparity in Computer Science in the UK, Malaysia and China

Global gender disparity in science is an unsolved problem. Predicting ge...
research
02/07/2021

What's in a Name? – Gender Classification of Names with Character Based Machine Learning Models

Gender information is no longer a mandatory input when registering for a...
research
08/22/2023

Inferring gender from name: a large scale performance evaluation study

A person's gender is a crucial piece of information when performing rese...
research
05/18/2020

Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia

Chinese dynastic histories form a large continuous linguistic space of a...
research
10/21/2020

Gender Prediction Based on Vietnamese Names with Machine Learning Techniques

As biological gender is one of the aspects of presenting individual huma...
research
05/12/2023

Global method for gender profile estimation from distribution of first names

As social issues related to gender bias attract closer scrutiny, accurat...
research
09/03/2019

Gender-based homophily in collaborations across a heterogeneous scholarly landscape

Using the corpus of JSTOR articles, we investigate the role of gender in...

Please sign up or login with your details

Forgot password? Click here to reset