Effective Subword Segmentation for Text Comprehension

11/06/2018
by   Zhuosheng Zhang, et al.
0

Character-level representations have been broadly adopted to alleviate the problem of effectively representing rare or complex words. However, character itself is not a natural minimal linguistic unit for representation or word embedding composing due to ignoring the linguistic coherence of consecutive characters inside word. This paper presents a general subword-augmented embedding framework for learning and composing computationally-derived subword-level representations. We survey a series of unsupervised segmentation methods for subword acquisition and different subword-augmented strategies for text understanding, showing that subword-augmented embedding significantly improves our baselines in multiple text understanding tasks on both English and Chinese languages.

READ FULL TEXT
research
06/24/2018

Subword-augmented Embedding for Cloze Reading Comprehension

Representation learning is the foundation of machine reading comprehensi...
research
08/07/2018

Effective Character-augmented Word Embedding for Machine Reading Comprehension

Machine reading comprehension is a task to model relationship between pa...
research
03/23/2023

Retrieval-Augmented Classification with Decoupled Representation

Pretrained language models (PLMs) have shown marvelous improvements acro...
research
04/16/2019

A Systematic Study of Leveraging Subword Information for Learning Word Representations

The use of subword-level information (e.g., characters, character n-gram...
research
07/13/2022

Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models

Most of the Chinese pre-trained models adopt characters as basic units f...
research
10/07/2018

Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Transliteration converts words in a source language (e.g., English) into...
research
03/15/2021

Sent2Matrix: Folding Character Sequences in Serpentine Manifolds for Two-Dimensional Sentence

We study text representation methods using deep models. Current methods,...

Please sign up or login with your details

Forgot password? Click here to reset