Language Detection Engine for Multilingual Texting on Mobile Devices

01/07/2021
by   Sourabh Vasant Gothe, et al.
0

More than 2 billion mobile users worldwide type in multiple languages in the soft keyboard. On a monolingual keyboard, 38 are valid in another language. This can be easily avoided by detecting the language of typed words and then validating it in its respective language. Language detection is a well-known problem in natural language processing. In this paper, we present a fast, light-weight and accurate Language Detection Engine (LDE) for multilingual typing that dynamically adapts to user intended language in real-time. We propose a novel approach where the fusion of character N-gram model and logistic regression based selector model is used to identify the language. Additionally, we present a unique method of reducing the inference time significantly by parameter reduction technique. We also discuss various optimizations fabricated across LDE to resolve ambiguity in input text among the languages with the same character pattern. Our method demonstrates an average accuracy of 94.5 for European languages on the code-switched data. This model outperforms fastText by 60.39 is faster on mobile device with an average inference time of 25.91 microseconds.

READ FULL TEXT

page 1

page 7

12/22/2016

Continuous multilinguality with language vectors

Most existing models for multilingual natural language processing (NLP) ...
01/07/2021

Real-Time Optimized N-gram For Mobile Devices

With the increasing number of mobile devices, there has been continuous ...
01/12/2017

LanideNN: Multilingual Language Identification on Character Window

In language identification, a common first step in natural language proc...
05/07/2021

Generalising Multilingual Concept-to-Text NLG with Language Agnostic Delexicalisation

Concept-to-text Natural Language Generation is the task of expressing an...
03/10/2022

SATLab at SemEval-2022 Task 4: Trying to Detect Patronizing and Condescending Language with only Character and Word N-grams

A logistic regression model only fed with character and word n-grams is ...
09/24/2020

Novel Keyword Extraction and Language Detection Approaches

Fuzzy string matching and language classification are important tools in...
01/17/2022

Handling Compounding in Mobile Keyboard Input

This paper proposes a framework to improve the typing experience of mobi...