Instate: Predicting the State of Residence From Last Name

03/13/2023
by   Atul Dhingra, et al.
0

India has twenty-two official languages. Serving such a diverse language base is a challenge for survey statisticians, call center operators, software developers, and other such service providers. To help provide better services to different language communities via better localization, we introduce a new machine learning model that predicts the language(s) that the user can speak from their name. Using nearly 438M records spanning 33 Indian states and 1.13M unique last names from the Indian Electoral Rolls Corpus (?), we build a character-level transformer-based machine-learning model that predicts the state of residence based on the last name. The model has a top-3 accuracy of 85.3 to infer languages understood by the respondent. We provide open-source software that implements the method discussed in the paper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2021

raceBERT – A Transformer-based Model for Predicting Race and Ethnicity from Names

This paper presents raceBERT – a transformer-based model for predicting ...
research
10/06/2011

Predicting User Actions in Software Processes

This paper describes an approach for user (e.g. SW architect) assisting ...
research
05/07/2015

Contextual Analysis for Middle Eastern Languages with Hidden Markov Models

Displaying a document in Middle Eastern languages requires contextual an...
research
09/04/2020

Linguistically inspired morphological inflection with a sequence to sequence model

Inflection is an essential part of every human language's morphology, ye...
research
05/06/2022

Aksharantar: Towards building open transliteration tools for the next billion users

We introduce Aksharantar, the largest publicly available transliteration...
research
12/17/2020

Towards Smart e-Infrastructures, A Community Driven Approach Based on Real Datasets

e-Infrastructures have powered the successful penetration of e-services ...
research
01/29/2022

Learning to pronounce as measuring cross-lingual joint orthography-phonology complexity

Machine learning models allow us to compare languages by showing how har...

Please sign up or login with your details

Forgot password? Click here to reset