Handling Compounding in Mobile Keyboard Input

01/17/2022
by   Andreas Kabel, et al.
0

This paper proposes a framework to improve the typing experience of mobile users in morphologically rich languages. Smartphone keyboards typically support features such as input decoding, corrections and predictions that all rely on language models. For latency reasons, these operations happen on device, so the models are of limited size and cannot easily cover all the words needed by users for their daily tasks, especially in morphologically rich languages. In particular, the compounding nature of Germanic languages makes their vocabulary virtually infinite. Similarly, heavily inflecting and agglutinative languages (e.g. Slavic, Turkic or Finno-Ugric languages) tend to have much larger vocabularies than morphologically simpler languages, such as English or Mandarin. We propose to model such languages with automatically selected subword units annotated with what we call binding types, allowing the decoder to know when to bind subword units into words. We show that this method brings around 20 This is more than twice the improvement we previously obtained with a more basic approach, also described in the paper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2019

Stem-driven Language Models for Morphologically Rich Languages

Neural language models (LMs) have shown to benefit significantly from en...
research
03/20/2023

Infinite Words and Morphic Languages Formalized in Isabelle/HOL

We present a formalization of basics related to infinite words in the ge...
research
04/19/2022

Impact of Tokenization on Language Models: An Analysis for Turkish

Tokenization is an important text preprocessing step to prepare input to...
research
07/12/2020

Neural disambiguation of lemma and part of speech in morphologically rich languages

We consider the problem of disambiguating the lemma and part of speech o...
research
03/26/2022

Joint Transformer/RNN Architecture for Gesture Typing in Indic Languages

Gesture typing is a method of typing words on a touch-based keyboard by ...
research
05/23/2023

CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models

While many languages possess processes of joining two or more words to c...
research
11/02/2018

Neural Task Representations as Weak Supervision for Model Agnostic Cross-Lingual Transfer

Natural language processing is heavily Anglo-centric, while the demand f...

Please sign up or login with your details

Forgot password? Click here to reset