Charagram: Embedding Words and Sentences via Character n-grams

07/10/2016
by   John Wieting, et al.
0

We present Charagram embeddings, a simple approach for learning character-based compositional models to embed textual sequences. A word or sentence is represented using a character n-gram count vector, followed by a single nonlinear transformation to yield a low-dimensional embedding. We use three tasks for evaluation: word similarity, sentence similarity, and part-of-speech tagging. We demonstrate that Charagram embeddings outperform more complex architectures based on character-level recurrent and convolutional neural networks, achieving new state-of-the-art performance on several similarity tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2018

Character-based Neural Networks for Sentence Pair Modeling

Sentence pair modeling is critical for many NLP tasks, such as paraphras...
research
10/29/2018

Learning Better Internal Structure of Words for Sequence Labeling

Character-based neural models have recently proven very useful for many ...
research
05/02/2016

Compositional Sentence Representation from Character within Large Context Text

This paper describes a Hierarchical Composition Recurrent Network (HCRN)...
research
04/22/2018

A Study on Passage Re-ranking in Embedding based Unsupervised Semantic Search

State of the art approaches for (embedding based) unsupervised semantic ...
research
02/13/2018

Sentence Boundary Detection for French with Subword-Level Information Vectors and Convolutional Neural Networks

In this work we tackle the problem of sentence boundary detection applie...
research
04/11/2019

Gating Mechanisms for Combining Character and Word-level Word Representations: An Empirical Study

In this paper we study how different ways of combining character and wor...
research
11/25/2015

Towards Universal Paraphrastic Sentence Embeddings

We consider the problem of learning general-purpose, paraphrastic senten...

Please sign up or login with your details

Forgot password? Click here to reset