DeepAI AI Chat
Log In Sign Up

Hierarchical Character-Word Models for Language Identification

08/10/2016
by   Aaron Jaech, et al.
University of Washington
0

Social media messages' brevity and unconventional spelling pose a challenge to language identification. We introduce a hierarchical model that learns character and contextualized word-level representations for language identification. Our method performs well against strong base- lines, and can also reveal code-switching.

READ FULL TEXT

page 1

page 2

page 3

page 4

03/10/2018

Language Identification of Bengali-English Code-Mixed data using Character & Phonetic based LSTM Models

Language identification of social media text still remains a challenging...
01/08/2017

Sentence-level dialects identification in the greater China region

Identifying the different varieties of the same language is more challen...
05/11/2016

Tweet2Vec: Character-Based Distributed Representations for Social Media

Text from social media provides a set of challenges that can cause tradi...
07/20/2017

A Sub-Character Architecture for Korean Language Processing

We introduce a novel sub-character architecture that exploits a unique c...
01/24/2019

Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks

A character-level convolutional neural network (CNN) motivated by applic...

Code Repositories

twitter_langid

A hierarchical character-word neural network for language identification


view repo