The rise of racism and xenophobia has become a remarkable social phenomenon stemming from Covid-19 as a global pandemic. Especially, attention has been increasingly drawn to the Covid-19 related racism and xenophobia which has manifested a more infectious nature and harmful consequences compared to the virus itself . According to BBC report, throughout 2020, anti-Asian hate crimes increased by nearly one hundred and fifty percent, and there were around three thousand eight hundred anti-Asian racist incidents. Therefore, it has become urgent to comprehend public opinions regarding racism and xenophobia for the enactment of effective intervention policies preventing the evolvement of racist hate crimes and social exclusion under Covid-19. Social media as a critical public sphere for opinion expression provides platform for big social data analytics to understand and capture the dynamics of racist and xenophobic discourse alongside the development of Covid-19.
made an early and probably the first attempt to analyse the emergence of Sinophobic behaviour on Twitter and Reddit platforms. Soon after studied the role of counter hate speech in facilitating the spread of hate and racism against the Chinese and Asian community. The authors in  attempted to study the effect of hate speech on Twitter targeted on specific groups such as the older community and Asian community in general. The work in  demonstrated the dynamic changes in the sentiments along with the major racist and xenophobic hashtags discussed across the early time period of Covid-19. The authors in 
explored the user behavior which triggers the hate speech on Twitter and later how it diffuses via retweets across the network. All these methods have used highly advanced computational techniques and state-of-the-art language models for extracting insights from the data mined from Twitter and other platforms.
While focusing on technical advancement, many studies tend to neglect the foundation for accurate data detection and analysis – that is how to define racism and xenophobia. Especially, the computational techniques and models tend to apply a binary definition (either racist or non-racist) to categorise the linguistic features of the texts, with limited attention paid to the nuances of racist and xenophobic behaviours. However, understanding the nuances is critical for mapping the comprehensive picture of the development of racist and xenophobic discourse alongside the evolvement of Covid-19 – whether and how the expression of racism and xenophobia may change the topics across time. More importantly, capturing these changes reflected in the online public sphere will enable a more accurate comprehension and even prediction of public opinions and actions regarding racism and xenophobia in the offline world.
Reaching this goal demands a combination of computational methods and social science perspectives, which becomes the focus of this research. With the aid of BERT (Bi-directional Encoder Representations from Transformers)  and topic modelling , The main contribution of this research lies in two aspects:
Development of a four-dimensional categorization of racist and xenophobic texts into stigmatization, offensiveness, blame, and exclusion;
Performing a stage wise analysis of the categorized racist and xenophobic data to capture the dynamic changes amongst the discussion across the development of COVID-19.
Especially, this research situates the examination in Twitter, the most influential platform for political online discussion. And we focus on the most turbulent early phase of Covid-19 (Jan to Apr 2020) where the unexpected and constant global expansion of virus kept on changing people’s perception of this public health crisis and how it is related to race and nationality. To specify, this research divides the early phase into three stages based on the changing definitions of Covid-19 made by World Health Organization (WHO) - (1) 1st to 31st Jan 2020 as a domestic epidemic referred to as stage 1 (S1); (2) 1st Feb to 11th Mar 2020 as an International Public Health Emergency (after the announcement made by WHO on 1st Feb) referred to as stage 2 (S2); (3) 12th Mar to 30th Apr 2020 as a global pandemic (based on the new definition given by WHO on 11th Mar) referred to as stage 3 (S3).
The rest of the paper is organized as follows. In section 2, we outline the dataset mined from Twitter. Section 3 deals with two parts - firstly, it presents the data and method employed for category-based racism and xenophobia detection. Secondly, it details the topic modelling employed for extracting topics from the categorized data. In section 4, we discuss the findings of the overall process with the focus on topics emerging amongst the different racism and xenophobia categories across the early development of Covid-19. Finally, we conclude this paper in section 5.
Dataset of this research is comprised of 247,153 tweets extracted through Tweepy API111https://www.tweepy.org/ from the eighteen most circulated racist and xenophobic hashtags related to Covid-19 from 1st January to 30th April in the year of 2020. The list of selected hashtags is as follows - #chinavirus, #chinesevirus, #boycottchina, #ccpvirus, #chinaflu, #china_is_terrorist, #chinaliedandpeopledied, #chinaliedpeopledied, #chinalies, #chinamustpay, #chinapneumonia, #chinazi, #chinesebioterrorism, #chinesepneumonia, #chinesevirus19, #chinesewuhanvirus, #viruschina, and #wuflu. The extracted tweets from the above hashtags are further divided into three stages that define the early development of Covid-19 as mentioned earlier.
|Stigmatization||Confirming negative stereotypes for conveying
a devalued social identity within a particular context
|“For all the #ChinaVirus jumped from a bat at the wet market”|
|Offensiveness||Attacking a particular social group
through aggressive and abusive language
|“Real misogyny in communist China. #chinazi #China_is_terrorist #China_is_terrorists #FuckTheCCP”|
|Blame||Attributing the responsibility for the
negative consequences of the crisis to one social group
|“These Chinese are absolutely disgusting. They spread the #ChineseVirus. Their lies created a pandemic #ChinaMustPay”|
|Exclusion||the process of othering to draw a clear boundary
between in-group and out-group members
|“China deserves to be isolated by all means forever. SARS was also initiated in China, 2003 by eating anything & everything #BoycottChina”|
3.1 Category-based racism and xenophobia detection
Beyond a binary categorization of racism and xenophobia, this research applies the perspective of social science to categorizing racism and xenophobia into four dimensions as demonstrated in Table 1. This basically translates into a problem of five class classification of text data, where four classes represent the racism and xenophobia categories and fifth class corresponds to the category of non-racist and non-xenophobic.
3.1.1 Annotated dataset
For this purpose, we annotate a dataset of 6000 tweets. These tweets were randomly selected from all hashtags across the three development stages, and annotated by four research assistants with inter-coder reliability reaching above 70%. The annotation followed a coding method with 0 representing stigmatization, 1 for offensiveness, 2 for blame, and 3 for exclusion in alignment with the linguistic features of the tweets. The non-marked tweets were regarded as non-racist and non-xenophobic and represented class category 4. We limit the annotation for each tweet to only one label which aligns to the strongest category. The distribution of 6000 tweets amongst the five classes is as follows - 1318 stigmatization, 1172 offensive, 1045 blame, 1136 exclusion, and 1329 non-racist and non-xenophobic.
We view the task of classification of the above-mentioned categories as a supervised learning problem and target developing machine learning and deep learning techniques for the same. We firstly pre-process the input data text by removing punctuation and URLs from a text sample and converting it to lower case before providing it to train our models. We split the data into random train and test splits with 90:10 ratio for training and evaluating the performance of our models respectively.
Recently, word language models such as Bi-directional Encoder Representations from Transformers (BERT) 
have become extremely popular due to their state-of-the-art performance on natural language processing tasks. Due to the nature of bi-directional training of BERT, it can learn the word representations from unlabelled text data powerfully and enables it to have a better performance compared to the other machine learning and deep learning techniques
. The common approach for adopting BERT for a specific task on a smaller dataset is to fine-tune a pre-trained BERT model which has already learnt the deep context-dependent representations. We select the “bert-base-uncased” model which comprises of 12 layers, 12 self-attention heads, a hidden size of 768 totalling 110M parameters. We fine-tune the BERT model with a categorical cross-entropy loss for the five categories. The various hyperparameters used for fine-tuning the BERT model are selected as recommended from the paper
. We use the AdamW optimizer with the standard learning rate of 2e-5, a batch size of 16, and train it for 5 epochs. For selecting the maximum length of the sequences, we tokenize the whole dataset using Bert tokenizer and check the distribution of the token lengths. We notice that the minimum value of token length is 8, maximum is 130, median is 37 and mean is 42. Based on the density distribution shown in Fig.1, we experiment with two values of sequence length – 64 and 128 and find that the sequence length of 64 provides a better performance.
As additional baselines, we also train two more techniques. Long Short Term Memory Networks (LSTMs)
have been very popular with text data as they can learn the dependencies of various words in the context of a text. Also, machine learning algorithms such as Support Vector Machine (SVMs) have been used previously by researchers for text classification tasks. We adopt the same data pre-processing and implementation technique as mentioned earlier and train the SVM with grid search, a 5-layer LSTM (using the pre-trained Glove  embeddings) and BERT model for the category detection of the racist and xenophobic tweets.
For evaluating the machine learning and deep learning approaches on our test dataset, we use the metrics of average accuracy and weighted f1-score for the five categories. The performance of the model is shown in Table 2. It can be seen from Table 2 that the fine-tuned BERT model performs the best compared to SVM and LSTM in terms of both accuracy and f1 score. Thus, we employ this fine-tuned BERT model for categorizing all the tweets from the remaining dataset. Having employed BERT on the remaining dataset, we get a refined dataset of the four categories of tweets spreaded across the three stages as shown in Table 3.
3.2 Topic modelling
Topic modelling is one of the most extensively used methods in natural language processing for finding relationships across text documents, topic discovery and clustering, and extracting semantic meaning from a corpus of unstructured data . Many techniques have been developed by researchers such as Latent Semantic Analysis (LSA) , Probabilistic Latent Semantic Analysis (pLSA)  for extracting semantic topic clusters from the corpus of data. In the last decade, Latent Dirichlet Allocation (LDA)  has become a successful and standard technique for inferring topic clusters from texts for various applications such as opinion mining , social medial analysis , event detection  and consequently there have also been various developed variants of LDA  and .
For our research, we adopt the baseline LDA model with Variational Bayes sampling from Gensim222https://pypi.org/project/gensim/ and the LDA Mallet model  with Gibbs sampling for extracting the topic clusters from the text data. Before passing the corpus of data to the LDA models, we perform data pre-processing and cleaning which include the following steps. Firstly, we remove any new line characters, punctuations, URLs, mentions and hashtags. Later we tokenize the texts in the corpus and also remove any stopwords using the Gensim utility of pre-processing and stopwords defined in the NLTK333https://pypi.org/project/nltk/ corpus. Finally, we make bigrams and lemmatize the words in the text.
After employing the above pre-processing for our corpus, we employ topic modelling using LDA from Gensim and LDA Mallet. We perform experiments by varying the number of topics from 5 to 25 at an interval of 5 and checking the corresponding coherence score of the models as was done in . We train the models for 1000 iterations with varying number of topics, optimizing the hyperparameters every 10 passes after each 100 pass period. We set the values of , which control the distribution of topics and the vocabulary words amongst the topics to the default settings of 1 divided by the number of topics. We notice from our experiments that LDA Mallet has a higher coherence score (0.60-0.65) compared to the LDA model from Gensim (0.49-0.55) and thus we select LDA Mallet model for the task of topic modelling on our corpus of data.
The above strategy is employed for each racist and xenophobic category and for every stage individually. We find the highest coherence score corresponding to a specific number of topics for each category and stage. To analyse the results, we reduce the number of topics to 5 by clustering closely related topics using equation 1.
where refers to the number of topics to be clustered, represents the number of keywords in each topic, corresponds to the probability of the word in the topic, and is the resultant topic containing the average probabilities of all the words from the topics. We then represent the top 10 highest probability words in the resultant topic for every category and stage as is shown in Tables 4 to 7.
Table 4, 5, 6 and 7 demonstrate the ten most salient terms related to the generated five topics for each stage (S1, S2, and S3) of four categories, and we summarize each topic through the correlation between the ten terms. We put a question mark for topics from which no pattern can be generated. In general, under the four categories, China and Chinese are always at the centre of discussion. When considering the dynamics across stages, tweets of all four categories extended the discussion to the world situation, and terms representing other nations and races/ethnicities besides China and Chinese started to emerge.
Notably, the category-based detection and analysis enable us to capture the nuances of themes, and how themes develop through different trajectories across the stages. To specify, the topics in the category of stigmatization centre on virus. Discussion tends to associate China and Chinese with the infection and outbreak of virus as well as its negative influences (e.g. emergency; travel). In stage 3, discussion around America became a new focus, with terms trump, president, and propaganda showing up.
The discussion in the category of offensiveness is more political oriented compared to other categories. Especially, in the first two stages, discussion included sensitive political terms concerning China (e.g., hk, uyghyr, taiwan). Besides, ccp (Chinese Communist Party) and human right are two important topics. Only till stage 3, the topics in offensiveness gradually switched the focus to virus.
The data in the category of blame focuses on attributing the cause and consequence of virus to a particular political system (e.g., lie; autocracy, deceit) in the early stages of the discussion. Alike stigmatization, american and president emerged as new topics in stage 3 for the category of blame, although the overall three stages remained the focus on terms like lie and cover-up by the government.
The category of exclusion emphasizes virus, trade and human right. Especially, in terms of trade, more negative words are associated with it alongside the development of Covid-19 (e.g. from stop in stage 2 to stop and boycott in stage 3). Additionally, in stage 3, india and indian were related to china under the topic of trade.
5 Discussion and Conclusions
Bridging computational methods with social science theories, this research proposes a four-dimensional category for the detection of racist and xenophobic texts in the context of Covid-19. This categorization, combined with a stage wise analysis, enables us to capture the diversity of the topics emerging from racist and xenophobic expression on Twitter, and their dynamic changes across the early stages of Covid-19. This enables the methodological advancement proposed by this research to be transformed into constructive policy suggestions. For instance, as demonstrated in the findings, the topics falling under the category of offensiveness are more likely to be associated with sensitive political issues around China rather than virus in stage 1 and stage 2. Therefore, how to split the discussion of virus from the association of virus with other political topics should draw attention from government of different countries, and this agenda should be incorporated into the official media coverage from the government. Another example is from the category of blame. As shown in the findings, blame usually targets at the transparency of the information from the government (Chinese government especially in early Covid-19). Consequently, it is critical for government of different countries to work on effective and prompt communication with the public under Covid-19. We believe the contribution of this research can be generated beyond the context of Covid-19 to provide insights for future research on racism and xenophobia on digital platforms.
-  (2005) Racialised ‘othering’. Journalism: critical issues, pp. 274–286. Cited by: Table 1.
-  (2003) Hierarchical topic models and the nested chinese restaurant process.. In NIPS, Vol. 16. Cited by: §3.2.
-  (2010) Supervised topic models. arXiv preprint arXiv:1003.0783. Cited by: §3.2.
-  (2003) Latent dirichlet allocation. the Journal of machine Learning research 3, pp. 993–1022. Cited by: §1, §3.2.
-  (2020) The covid-19 social media infodemic. Scientific Reports 10 (1), pp. 1–10. Cited by: §1.
-  (2013) Classifying political orientation on twitter: it’s not easy!. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 7. Cited by: §3.2.
-  (2000) An empirical analysis of image restoration: texaco’s racism crisis. Journal of Public Relations Research 12 (2), pp. 163–178. Cited by: Table 1.
-  (1990) Indexing by latent semantic analysis. Journal of the American society for information science 41 (6), pp. 391–407. Cited by: §3.2.
-  (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §1, §3.1.2.
-  (2016) Exploring key hackers and cybersecurity threats in chinese hacker communities. In 2016 IEEE Conference on Intelligence and Security Informatics (ISI), pp. 13–18. Cited by: §3.2.
-  (2020) Causal modeling of twitter activity during covid-19. Computation 8 (4), pp. 85. Cited by: §1.
-  (2021) How covid-19 is changing our language: detecting semantic shift in twitter word embeddings. arXiv preprint arXiv:2102.07836. Cited by: §1.
Support vector machines. IEEE Intelligent Systems and their applications 13 (4), pp. 18–28. Cited by: §3.1.2.
-  (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §3.1.2.
-  (1999) Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 50–57. Cited by: §3.2.
-  (2019) Latent dirichlet allocation (lda) and topic modeling: models, applications, a survey. Multimedia Tools and Applications 78 (11), pp. 15169–15211. Cited by: §3.2.
-  (2013) Expressivism and the offensiveness of slurs. Philosophical Perspectives 27 (1), pp. 231–259. Cited by: Table 1.
-  (2020) Analyzing covid-19 on online social media: trends, sentiments and emotions. arXiv preprint arXiv:2005.14464. Cited by: §1.
-  (2010) Pet: a statistical model for popular events tracking in social communities. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 929–938. Cited by: §3.2.
-  (2020) Hate is the new infodemic: a topic-aware modeling of hate speech diffusion on twitter. arXiv preprint arXiv:2010.04377. Cited by: §1.
-  (2002) Mallet: a machine learning for language toolkit. http://mallet. cs. umass. edu. Cited by: §3.2.
-  (2001) A theoretical perspective on coping with stigma. Journal of social issues 57 (1), pp. 73–92. Cited by: Table 1.
-  (2020) # coronavirus or# chinesevirus?!: understanding the negative sentiment reflected in tweets with racist hashtags across the development of covid-19. arXiv preprint arXiv:2005.08224. Cited by: §1.
-  (2014) Glove: global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543. Cited by: §3.1.2.
-  (2020) ” Go eat a bat, chang!”: an early look on the emergence of sinophobic behavior on web communities in the face of covid-19. arXiv preprint arXiv:2004.04046. Cited by: §1.
Exploring casual covid-19 data visualizations on twitter: topics and challenges. In Informatics, Vol. 7, pp. 35. Cited by: §1.
-  (2020) On analyzing covid-19-related hate speech using bert attention. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 669–676. Cited by: §1.
-  (2021) ‘I’m more afraid of racism than of the virus!’: racism awareness and resistance among chinese migrants and their descendants in france during the covid-19 pandemic. European Societies 23 (sup1), pp. S721–S742. Cited by: §1.
-  (2011) Constrained lda for grouping product features in opinion mining. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 448–459. Cited by: §3.2.
-  (2020) Racism is a virus: anti-asian hate and counterhate in social media during the covid-19 crisis. arXiv preprint arXiv:2005.12423. Cited by: §1.