Character-based Neural Embeddings for Tweet Clustering

03/15/2017
by   Svitlana Vakulenko, et al.
0

In this paper we show how the performance of tweet clustering can be improved by leveraging character-based neural networks. The proposed approach overcomes the limitations related to the vocabulary explosion in the word-based models and allows for the seamless processing of the multilingual content. Our evaluation results and code are available on-line at https://github.com/vendi12/tweet2vec_clustering

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2020

Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English

Recent work has shown that deeper character-based neural machine transla...
research
11/17/2021

Character Transformations for Non-Autoregressive GEC Tagging

We propose a character-based nonautoregressive GEC approach, with automa...
research
02/18/2023

RetVec: Resilient and Efficient Text Vectorizer

This paper describes RetVec, a resilient multilingual embedding scheme d...
research
08/28/2018

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

Character-level features are currently used in different neural network-...
research
10/24/2021

Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models

This work explores the capacities of character-based Neural Machine Tran...
research
06/08/2021

FastSeq: Make Sequence Generation Faster

Transformer-based models have made tremendous impacts in natural languag...
research
11/02/2017

A Comparison of Feature-Based and Neural Scansion of Poetry

Automatic analysis of poetic rhythm is a challenging task that involves ...

Please sign up or login with your details

Forgot password? Click here to reset