Nonsymbolic Text Representation

10/03/2016
by   Hinrich Schuetze, et al.
0

We introduce the first generic text representation model that is completely nonsymbolic, i.e., it does not require the availability of a segmentation or tokenization method that attempts to identify words or other symbolic units in text. This applies to training the parameters of the model on a training corpus as well as to applying it when computing the representation of a new text. We show that our model performs better than prior work on an information extraction and a text denoising task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2023

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition

We propose JEIT, a joint end-to-end (E2E) model and internal language mo...
research
09/08/2022

Text-Free Learning of a Natural Language Interface for Pretrained Face Generators

We propose Fast text2StyleGAN, a natural language interface that adapts ...
research
09/18/2017

Word Vector Enrichment of Low Frequency Words in the Bag-of-Words Model for Short Text Multi-class Classification Problems

The bag-of-words model is a standard representation of text for many lin...
research
12/08/2019

Attentive Representation Learning with Adversarial Training for Short Text Clustering

Short text clustering has far-reaching effects on semantic analysis, sho...
research
10/30/2019

Contextual Text Denoising with Masked Language Models

Recently, with the help of deep learning models, significant advances ha...
research
06/24/2017

Cluster Based Symbolic Representation for Skewed Text Categorization

In this work, a problem associated with imbalanced text corpora is addre...
research
02/25/2020

Declarative Memory-based Structure for the Representation of Text Data

In the era of intelligent computing, computational progress in text proc...

Please sign up or login with your details

Forgot password? Click here to reset