Finding Experts in Transformer Models

05/15/2020
by   Xavier Suau, et al.
0

In this work we study the presence of expert units in pre-trained Transformer Models (TM), and how they impact a model's performance. We define expert units to be neurons that are able to classify a concept with a given average precision, where a concept is represented by a binary set of sentences containing the concept (or not). Leveraging the OneSec dataset (Scarlini et al., 2019), we compile a dataset of 1641 concepts that allows diverse expert units in TM to be discovered. We show that expert units are important in several ways: (1) The presence of expert units is correlated (r^2=0.833) with the generalization power of TM, which allows ranking TM without requiring fine-tuning on suites of downstream tasks. We further propose an empirical method to decide how accurate such experts should be to evaluate generalization. (2) The overlap of top experts between concepts provides a sensible way to quantify concept co-learning, which can be used for explainability of unknown concepts. (3) We show how to self-condition off-the-shelf pre-trained language models to generate text with a given concept by forcing the top experts to be active, without requiring re-training the model or using additional parameters.

READ FULL TEXT
research
09/30/2021

Self-conditioning pre-trained language models

We study the presence of expert units in pre-trained Transformer-based L...
research
10/05/2021

MoEfication: Conditional Computation of Transformer Models for Efficient Inference

Transformer-based pre-trained language models can achieve superior perfo...
research
07/28/2023

Cross-Modal Concept Learning and Inference for Vision-Language Models

Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP, est...
research
08/04/2019

A Repairable System Supported by Two Spare Units and Serviced by Two Types of Repairers

We study a one-unit repairable system, supported by two identical spare ...
research
10/20/2019

Learning from both experts and data

In this work we study the problem of inferring a discrete probability di...
research
05/08/2023

Interactive Concept Learning for Uncovering Latent Themes in Large Text Collections

Experts across diverse disciplines are often interested in making sense ...
research
07/24/2023

Concept-based explainability for an EEG transformer model

Deep learning models are complex due to their size, structure, and inher...

Please sign up or login with your details

Forgot password? Click here to reset