Contrastive Learning of Emoji-based Representations for Resource-Poor Languages

04/03/2018
by   Nurendra Choudhary, et al.
0

The introduction of emojis (or emoticons) in social media platforms has given the users an increased potential for expression. We propose a novel method called Classification of Emojis using Siamese Network Architecture (CESNA) to learn emoji-based representations of resource-poor languages by jointly training them with resource-rich languages using a siamese network. CESNA model consists of twin Bi-directional Long Short-Term Memory Recurrent Neural Networks (Bi-LSTM RNN) with shared parameters joined by a contrastive loss function based on a similarity metric. The model learns the representations of resource-poor and resource-rich language in a common emoji space by using a similarity metric based on the emojis present in sentences from both languages. The model, hence, projects sentences with similar emojis closer to each other and the sentences with different emojis farther from one another. Experiments on large-scale Twitter datasets of resource-rich languages - English and Spanish and resource-poor languages - Hindi and Telugu reveal that CESNA outperforms the state-of-the-art emoji prediction approaches based on distributional semantics, semantic rules, lexicon lists and deep neural network representations without shared parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2018

Emotions are Universal: Learning Sentiment Based Representations of Resource-Poor Languages using Siamese Networks

Machine learning approaches in sentiment analysis principally rely on th...
research
06/10/2018

Cross-Lingual Task-Specific Representation Learning for Text Classification in Resource Poor Languages

Neural network models have shown promising results for text classificati...
research
01/23/2014

Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages

We propose a novel language-independent approach for improving machine t...
research
04/18/2021

Chinese Sentences Similarity via Cross-Attention Based Siamese Network

Measuring sentence similarity is a key research area nowadays as it allo...
research
11/23/2018

Learning pronunciation from a foreign language in speech synthesis networks

Although there are more than 65,000 languages in the world, the pronunci...
research
04/28/2020

Learning to Learn Morphological Inflection for Resource-Poor Languages

We propose to cast the task of morphological inflection - mapping a lemm...
research
06/24/2019

SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech

Automatic syllable count estimation (SCE) is used in a variety of applic...

Please sign up or login with your details

Forgot password? Click here to reset