Elementwise Language Representation

02/27/2023
by   Dunam Kim, et al.
0

We propose a new technique for computational language representation called elementwise embedding, in which a material (semantic unit) is abstracted into a horizontal concatenation of lower-dimensional element (character) embeddings. While elements are always characters, materials are arbitrary levels of semantic units so it generalizes to any type of tokenization. To focus only on the important letters, the n^th spellings of each semantic unit are aligned in n^th attention heads, then concatenated back into original forms creating unique embedding representations; they are jointly projected thereby determining own contextual importance. Technically, this framework is achieved by passing a sequence of materials, each consists of v elements, to a transformer having h=v attention heads. As a pure embedding technique, elementwise embedding replaces the w-dimensional embedding table of a transformer model with 256 c-dimensional elements (each corresponding to one of UTF-8 bytes) where c=w/v. Using this novel approach, we show that the standard transformer architecture can be reused for all levels of language representations and be able to process much longer sequences at the same time-complexity without "any" architectural modification and additional overhead. BERT trained with elementwise embedding outperforms its subword equivalence (original implementation) in multilabel patent document classification exhibiting superior robustness to domain-specificity and data imbalance, despite using 0.005% of embedding parameters. Experiments demonstrate the generalizability of the proposed method by successfully transferring these enhancements to differently architected transformers CANINE and ALBERT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2021

Revisiting Language Encoding in Learning Multilingual Representations

Transformer has demonstrated its great power to learn contextual word re...
research
10/24/2020

Char2Subword: Extending the Subword Embedding Space from Pre-trained Models Using Robust Character Compositionality

Byte-pair encoding (BPE) is a ubiquitous algorithm in the subword tokeni...
research
11/04/2021

A text autoencoder from transformer for fast encoding language representation

In recent years BERT shows apparent advantages and great potential in na...
research
05/04/2023

AttentionViz: A Global View of Transformer Attention

Transformer models are revolutionizing machine learning, but their inner...
research
05/11/2020

CrisisBERT: Robust Transformer for Crisis Classification and Contextual Crisis Embedding

Classification of crisis events, such as natural disasters, terrorist at...
research
05/11/2020

CrisisBERT: a Robust Transformer for Crisis Classification and Contextual Crisis Embedding

Classification of crisis events, such as natural disasters, terrorist at...

Please sign up or login with your details

Forgot password? Click here to reset