Coloring the Black Box: What Synesthesia Tells Us about Character Embeddings

01/26/2021
by   Katharina Kann, et al.
3

In contrast to their word- or sentence-level counterparts, character embeddings are still poorly understood. We aim at closing this gap with an in-depth study of English character embeddings. For this, we use resources from research on grapheme-color synesthesia – a neuropsychological phenomenon where letters are associated with colors, which give us insight into which characters are similar for synesthetes and how characters are organized in color space. Comparing 10 different character embeddings, we ask: How similar are character embeddings to a synesthete's perception of characters? And how similar are character embeddings extracted from different models? We find that LSTMs agree with humans more than transformers. Comparing across tasks, grapheme-to-phoneme conversion results in the most human-like character embeddings. Finally, ELMo embeddings differ from both humans and other models.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 9

page 10

page 12

research
09/28/2020

A complete character recognition and transliteration technique for Devanagari script

Transliteration involves transformation of one script to another based o...
research
08/15/2018

Multiple Character Embeddings for Chinese Word Segmentation

Chinese word segmentation (CWS) is often regarded as a character-based s...
research
04/17/2017

Learning Character-level Compositionality with Visual Features

Previous work has modeled the compositionality of words by creating char...
research
10/23/2020

Identifying Similar Movie Characters Quickly but Effectively Using Non-exhaustive Pair-wise Attention

Identifying similar movie characters is a captivating task that can be o...
research
06/06/2022

What do tokens know about their characters and how do they know it?

Pre-trained language models (PLMs) that use subword tokenization schemes...
research
05/24/2023

Quantifying Character Similarity with Vision Transformers

Record linkage is a bedrock of quantitative social science, as analyses ...
research
10/19/2018

Learning Personas from Dialogue with Attentive Memory Networks

The ability to infer persona from dialogue can have applications in area...

Please sign up or login with your details

Forgot password? Click here to reset