Language Through a Prism: A Spectral Approach for Multiscale Language Representations

11/09/2020
by   Alex Tamkin, et al.
0

Language exhibits structure at different scales, ranging from subwords to words, sentences, paragraphs, and documents. To what extent do deep models capture information at these scales, and can we force them to better capture structure across this hierarchy? We approach this question by focusing on individual neurons, analyzing the behavior of their activations at different timescales. We show that signal processing provides a natural framework for separating structure across scales, enabling us to 1) disentangle scale-specific information in existing embeddings and 2) train models to learn more about particular scales. Concretely, we apply spectral filters to the activations of a neuron across an input, producing filtered embeddings that perform well on part of speech tagging (word-level), dialog speech acts classification (utterance-level), or topic classification (document-level), while performing poorly on the other tasks. We also present a prism layer for training models, which uses spectral filters to constrain different neurons to model structure at different scales. Our proposed BERT + Prism model can better predict masked tokens using long-range context and produces multiscale representations that perform better at utterance- and document-level tasks. Our methods are general and readily applicable to other domains besides language, such as images, audio, and video.

READ FULL TEXT
research
04/22/2023

N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models

Understanding the function of individual neurons within language models ...
research
10/17/2019

Universal Text Representation from BERT: An Empirical Study

We present a systematic investigation of layer-wise BERT activations for...
research
02/18/2020

Hierarchical Transformer Network for Utterance-level Emotion Recognition

While there have been significant advances in de-tecting emotions in tex...
research
12/16/2019

Scale-dependent Relationships in Natural Language

Natural language exhibits statistical dependencies at a wide range of sc...
research
04/09/2019

HiGRU: Hierarchical Gated Recurrent Units for Utterance-level Emotion Recognition

In this paper, we address three challenges in utterance-level emotion re...
research
05/02/2023

Finding Neurons in a Haystack: Case Studies with Sparse Probing

Despite rapid adoption and deployment of large language models (LLMs), t...
research
07/01/2021

What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis

End-to-end DNN architectures have pushed the state-of-the-art in speech ...

Please sign up or login with your details

Forgot password? Click here to reset