Hierarchical Character-Word Models for Language Identification

08/10/2016
by   Aaron Jaech, et al.
0

Social media messages' brevity and unconventional spelling pose a challenge to language identification. We introduce a hierarchical model that learns character and contextualized word-level representations for language identification. Our method performs well against strong base- lines, and can also reveal code-switching.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2018

Language Identification of Bengali-English Code-Mixed data using Character & Phonetic based LSTM Models

Language identification of social media text still remains a challenging...
research
09/22/2020

Ghmerti at SemEval-2019 Task 6: A Deep Word- and Character-based Approach to Offensive Language Identification

This paper presents the models submitted by Ghmerti team for subtasks A ...
research
01/08/2017

Sentence-level dialects identification in the greater China region

Identifying the different varieties of the same language is more challen...
research
05/11/2016

Tweet2Vec: Character-Based Distributed Representations for Social Media

Text from social media provides a set of challenges that can cause tradi...
research
07/20/2017

A Sub-Character Architecture for Korean Language Processing

We introduce a novel sub-character architecture that exploits a unique c...
research
12/11/2017

Social Media Writing Style Fingerprint

We present our approach for computer-aided social media text authorship ...

Please sign up or login with your details

Forgot password? Click here to reset