Sent2Matrix: Folding Character Sequences in Serpentine Manifolds for Two-Dimensional Sentence

03/15/2021
by   Hongyang Gao, et al.
18

We study text representation methods using deep models. Current methods, such as word-level embedding and character-level embedding schemes, treat texts as either a sequence of atomic words or a sequence of characters. These methods either ignore word morphologies or word boundaries. To overcome these limitations, we propose to convert texts into 2-D representations and develop the Sent2Matrix method. Our method allows for the explicit incorporation of both word morphologies and boundaries. When coupled with a novel serpentine padding method, our Sent2Matrix method leads to an interesting visualization in which 1-D character sequences are folded into 2-D serpentine manifolds. Notably, our method is the first attempt to represent texts in 2-D formats. Experimental results on text classification tasks shown that our method consistently outperforms prior embedding methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2017

Do Convolutional Networks need to be Deep for Text Classification ?

We study in this work the importance of depth in convolutional models fo...
research
11/03/2020

CharBERT: Character-aware Pre-trained Language Model

Most pre-trained language models (PLMs) construct word representations a...
research
10/14/2019

Restoring ancient text using deep learning: a case study on Greek epigraphy

Ancient history relies on disciplines such as epigraphy, the study of an...
research
05/23/2023

From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding

Current state-of-the-art models for natural language understanding requi...
research
03/11/2019

An Innovative Word Encoding Method For Text Classification Using Convolutional Neural Network

Text classification plays a vital role today especially with the intensi...
research
11/06/2018

Effective Subword Segmentation for Text Comprehension

Character-level representations have been broadly adopted to alleviate t...
research
12/03/2020

Label Enhanced Event Detection with Heterogeneous Graph Attention Networks

Event Detection (ED) aims to recognize instances of specified types of e...

Please sign up or login with your details

Forgot password? Click here to reset