Prevalence of code mixing in semi-formal patient communication in low resource languages of South Africa
In this paper we address the problem of code-mixing in resource-poor language settings. We examine data consisting of 182k unique questions generated by users of the MomConnect helpdesk, part of a national scale public health platform in South Africa. We show evidence of code-switching at the level of approximately 10 challenges for future services. We use a natural language processing library (Polyglot) that supports detection of 196 languages and attempt to evaluate its performance at identifying English, isiZulu and code-mixed questions.
READ FULL TEXT