DeepAI
Log In Sign Up

Finding Skill Neurons in Pre-trained Transformer-based Language Models

Transformer-based pre-trained language models have demonstrated superior performance on various natural language processing tasks. However, it remains unclear how the skills required to handle these tasks distribute among model parameters. In this paper, we find that after prompt tuning for specific tasks, the activations of some neurons within pre-trained Transformers are highly predictive of the task labels. We dub these neurons skill neurons and confirm they encode task-specific skills by finding that: (1) Skill neurons are crucial for handling tasks. Performances of pre-trained Transformers on a task significantly drop when corresponding skill neurons are perturbed. (2) Skill neurons are task-specific. Similar tasks tend to have similar distributions of skill neurons. Furthermore, we demonstrate the skill neurons are most likely generated in pre-training rather than fine-tuning by showing that the skill neurons found with prompt tuning are also crucial for other fine-tuning methods freezing neuron weights, such as the adapter-based tuning and BitFit. We also explore the applications of skill neurons, including accelerating Transformers with network pruning and building better transferability indicators. These findings may promote further research on understanding Transformers. The source code can be obtained from https://github.com/THU-KEG/Skill-Neuron.

READ FULL TEXT VIEW PDF

page 1

page 5

page 6

page 15

page 20

03/07/2022

One Model, Multiple Tasks: Pathways for Natural Language Understanding

This paper presents a Pathways approach to handle many tasks at once. Ou...
12/13/2021

VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks

Recently, fine-tuning language models pre-trained on large text corpora ...
01/10/2023

Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language Understanding

Current natural language understanding (NLU) models have been continuous...
09/03/2021

MitoVis: A Visually-guided Interactive Intelligent System for Neuronal Mitochondria Analysis

Neurons have a polarized structure, including dendrites and axons, and c...
11/17/2022

On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning

We empirically investigate how pre-training on data of different modalit...
01/25/2022

Do Transformers Encode a Foundational Ontology? Probing Abstract Classes in Natural Language

With the methodological support of probing (or diagnostic classification...
01/20/2021

Fast deep learning correspondence for neuron tracking and identification in C.elegans using synthetic training

We present an automated method to track and identify neurons in C. elega...

1 Introduction

Pre-trained language models (PLMs), mostly based on Transformer architecture (Vaswani et al., 2017), have achieved remarkable performance on broad and diverse natural language processing (NLP) tasks (Han et al., 2021). However, it remains unclear how the skills required to handle these tasks distribute among model parameters. Are there specific neurons within pre-trained Transformers encoding these skills? Progress on this problem may help to understand the working mechanisms of pre-trained Transformers (Zeiler and Fergus, 2014; Karpathy et al., 2015; Bau et al., 2020; Suau et al., 2020), intervene model behaviors (Bau et al., 2018; Mitchell et al., 2021), and improve model efficiency (Dalvi et al., 2020; Zhang et al., 2021).

Figure 1: Histogram of activation of a neuron within RoBERTa on positive-label (blue) and negative-label (orange) sentences in SST-2 validation set.

Prompt tuning (Li and Liang, 2021; Lester et al., 2021) prepends some trainable embeddings, i.e., soft prompts, into the inputs and adapts PLMs to handle tasks by only tuning the soft prompts while freezing all the PLM parameters. It has attracted wide attention recently as a promising parameter-efficient fine-tuning methods (Su et al., 2021; Liu et al., 2022). In this paper, we find that after prompt tuning for a task, the activations on soft prompts of some neurons within pre-trained Transformers are highly predictive for the task. For instance, Figure 1 shows the activation distribution of a specific neuron within RoBERTa (Liu et al., 2019b). This neuron’s activation is highly predictive of the labels of SST-2 (Socher et al., 2013)

, an established sentiment analysis dataset. When the input sentences express positive sentiments, the activations on soft prompts of this neuron tend to be much higher than when they express negative sentiments. It suggests that this neuron may encode the skill of distinguishing sentiments.

We dub these special neurons skill neurons and develop a simple and effective method to find them for classification tasks via prompt tuning. For a binary classification task, we first calculate the empirical mean activation on a soft prompt token over the training set for each neuron and use it as this neuron’s baseline activation. If this neuron’s activation for an input sample is higher than the baseline, we regard it as predicting one label and vice versa. We aggregate the prediction accuracies on the validation set of multiple soft prompts as the neuron’s predictivity score. The neurons with the highest predictivity scores are identified as skill neurons. For multi-class classification tasks, we decompose them into multiple binary classification subtasks and aggregate the skill neurons of subtasks as the skill neurons of the multi-class task.

We confirm the skill neurons encode task-specific skills with a series of experimental findings: (1) Skill neurons generally and stably emerge. For all the investigated tasks and random trials, we can consistently find skill neurons with high predictivities close to prompt tuning. (2) Skill neurons are crucial for handling tasks. When we perturb skill neurons by adding random noises to their activations, the performances on corresponding tasks drop much more significantly than when random neurons are perturbed. (3) Skill neurons are task-specific. Similar tasks exhibit similar predictivity rankings of skill neurons, and skill neurons of same-type tasks are more important for handling a task than those of different-type tasks. (4) Skill neurons are not from shallow word selectivity. The skill neurons typically do not selectively activate on keywords relating to the task, and their predictivities are not significantly influenced by the label words used in prompt tuning.

After showing that skill neurons encode skills, we further demonstrate that skill neurons are most likely generated in pre-training rather than manufactured by the fine-tuning process of prompt tuning. This is concluded from: (1) Even for randomly generated prompts and untuned hard prompts, the skill neurons still exhibit much better predictivity performance than random guesses. (2) Skill neurons are also crucial for other fine-tuning methods freezing neuron weights. Performance of models trained with adapter-based tuning (Houlsby et al., 2019) and BitFit (Ben-Zaken et al., 2022) significantly drops when the skill neurons found with prompt tuning are perturbed.

Moreover, we explore the practical applications of skill neurons. First, we apply skill neurons to network pruning (Anwar et al., 2017; Dalvi et al., 2020), which aims at removing redundant parameters to reduce memory cost and accelerate inference. Experiments show that by only keeping top skill neurons active, we can reduce the pre-trained Transformer to of its original parameters and achieve about inference speedup. Then we explore building better prompt transferability indicators following Su et al. (2021). We improve their overlapping rate of activated neurons metric by only taking skill neurons into account, and this achieves significantly better performance.

To summarize, our contributions are four-fold: (1) We observe the existence of skill neurons, the special neurons within pre-trained Transformers, which are highly predictive for specific tasks, and develop a method to find them via prompt tuning. (2) We empirically confirm that skill neurons do encode the skills required to handle tasks. (3) We show skill neurons are generated in pre-training rather than fine-tuning. (4) We preliminarily explore the applications of skill neurons. We hope these findings could facilitate future research on understanding the mechanism of PLMs.

2 Preliminary

We introduce the basic knowledge about prompt tuning (§ 2.1), the definition of investigated neurons (§ 2.2), and the investigation setup (§ 2.3).

2.1 Prompt Tuning

Prompt tuning (PT), or soft prompting, is a recently-developed parameter-efficient fine-tuning method, which has attracted wide attention with its capability to effectively adapt PLMs to downstream tasks (Li and Liang, 2021; Lester et al., 2021) and query inner knowledge of PLMs (Qin and Eisner, 2021; Zhong et al., 2021). PT prepends some soft prompts into the input sequences to prompt the PLM to decode the desired label words of the training task in the same way as the pre-training objective. For each task, a verbalizer function (Schick and Schütze, 2021) is used to map the specific label words to the labels of the task. Each soft prompt is a virtual token, which is essentially a trainable embedding. During prompt tuning, only the parameters in soft prompts are tuned, and all the PLM’s original parameters are frozen.

Formally, given an input sequence with tokens , prompt tuning prepends randomly initialized soft prompts before them, where and is the input dimension of the PLM. Taking the PLMs pre-trained with the masked language modeling objective (Devlin et al., 2019) as an example, a special [MASK] token is prepended, and the prompt tuning objective is to maximize the likelihood of filling desired label word into it:

(1)

Some initial prompt tuning works (Qin and Eisner, 2021; Zhong et al., 2021) regard soft prompts as the relaxation of natural language hard prompts, which are initially designed to query inner factual knowledge of PLMs (Petroni et al., 2019; Jiang et al., 2020). Su et al. (2021) hypothesize that soft prompts work by stimulating PLMs’ inner abilities. Inspired by these, we observe the inner activations of PLMs and find skill neurons.

2.2 Neurons in Transformers

Transformer (Vaswani et al., 2017) is the state-of-the-art NLP model architecture, which is used by the majority of PLMs (Devlin et al., 2019; Liu et al., 2019b; Brown et al., 2020; Raffel et al., 2020). A pre-trained Transformer model is typically stacked with multiple identical Transformer layers. Each Transformer layer consists of a self-attention module and a feed-forward network (FFN), among which the FFN carries two-thirds of the parameters. Previous work has highlighted the importance of FFN (Press et al., 2020; Dong et al., 2021) and found FFN encodes rich information (Suau et al., 2020; Geva et al., 2021; Dai et al., 2021). Inspired by these, we study the neurons and activations within FFN.

Formally, the FFN in a Transformer layer is:

(2)

where is the hidden embedding of a token,

is the activation function,

are trainable matrices, and are biases.

For simplicity, let . We regard , the -th element of , as the activation of the -th neuron on input . It represents the importance of and , the

-th column vectors of

and , respectively. Hence we define and as the weights of the -th neuron in this layer.

Although they study essentially the same parameters as us, Dai et al. (2021) and Zhang et al. (2021) use the term neuron to denote activations in our definition. Some other works (Dalvi et al., 2019; Durrani et al., 2020; Hennigen et al., 2020; Antverg and Belinkov, 2022) define a dimension in contextualized representations as a neuron. Since we study how the skills distribute among model parameters rather than input-dependent representations, we study the neurons defined in this section.

2.3 Investigation Setup

To comprehensively investigate the skill neuron phenomenon, we use RoBERTa (Liu et al., 2019b), a widely-used Transformer model pre-trained with the masked language modeling objective (Devlin et al., 2019), and conduct experiments on tasks of types, including: (1) Sentiment Analysis, including SST-2 (Socher et al., 2013), IMDB (Maas et al., 2011), and TweetEval (Tweet(Barbieri et al., 2020); (2) Natural Language Inference, including MNLI (Williams et al., 2018) and QNLI (Wang et al., 2019); (3) Topic Classification, including AG News and DBpedia (Zhang et al., 2015). Details about the tasks and prompt tuning implementations are shown in appendices B and A, respectively.

3 Finding Skill Neurons

We use a simple and effective method to find skill neurons for a given pre-trained Transformer .

3.1 Binary Classification Task

We first introduce how to find skill neurons for binary classification tasks. Let be a binary classification task and its dataset be , which is divided into training set , development set , and test set . The -th sample contains an input and its label .

For a specific neuron within , let be the activation of it on token given the input sentence . We firstly do prompt tuning on with and get a group of soft prompts . Given a soft prompt , we calculate the baseline activation of on over the training set as follows:

(3)

Intuitively, we can regard that the neuron predicts positive label for the input sentence when . Hence the prediction accuracy over the development set is as follows:

(4)

where is the indicator function evaluating to iff the holds.

The above way only considers the positive correlations between the labels and neuronal activations, which is also the case of previous work (Geva et al., 2021; Dai et al., 2021). However, strong negative correlations also suggest that the information about skills is encoded in this neuron. Conceptually, this is similar to the fact that inhibitory neurons in brains also contribute to certain functions (Rudy et al., 2011). Hence we define the predictivity of on soft prompt token as:

(5)

For each group of soft prompts , the predictivity of on it is defined as the predictivity of the best soft prompt token. Considering the skill neurons shall be consistently predictive, we conduct random trials of prompt tuning and get groups of prompts: . The overall predictivity of neuron is defined as:

(6)

Then we sort all the neurons within model by the descending order of their predictivities and use the top neurons as the skill neurons in experiments. Appendix G discusses some potential design choices considered in finding skill neurons.

Figure 2: Distribution of activations of two neurons on a soft prompt for samples in MNLI validation set. Dashed lines indicate baseline activations of the two neurons.

3.2 Multi-class Classification Task

To find skill neurons for a multi-class classification task, we first decompose it into multiple binary classification subtasks. Then we find skill neurons by ranking the neurons with their predictivities of the decomposed subtasks in a similar way as introduced in § 3.1 but use the soft prompts of the original task instead of subtasks. Skill neurons of the multi-class classification task consist of equal numbers of subtask skill neurons. For instance, MNLI (Williams et al., 2018)

task requires to classify the relationships between sentence pairs into

Entailment, Neutral and Contradiction. We decompose it into two subtasks: the first one is to classify Entailment and Contradiction samples, and the second one is to classify Neutral and Non-neutral samples. If we need top- skill neurons of MNLI, we will retrieve top- unique skill neurons for the two subtasks, respectively. Figure 2 shows the activation distribution of the two top skill neurons within RoBERTa of the two subtasks, respectively. The samples of three labels form three distinguishable clusters, which suggests the effectiveness of this skill-neuron-finding method. More details about how we decompose the investigated tasks are shown in appendix A.

Task
Prompt
Tuning
Skill
Neuron
SST-2
IMDB
Tweet
MNLI
QNLI
AG News
DBpedia
Table 1:

Accuracies (%) on various tasks of prompt tuning and skill neurons, along with standard deviations over

random trials. For the binary classification tasks, the skill neuron performance is the predictivity of the top-1 skill neuron. For multi-class classification tasks, the skill neuron performance is obtained by training a logistic regression model taking only the activations of the top-1 neurons of decomposed subtasks as inputs.

4 Do Skill Neurons Encode Skills?

We explore whether skill neurons really encode task-specific skills with a series of experiments.

4.1 Skill Neurons Generally and Stably Emerge

We first confirm that the skill neuron phenomenon is general and stable for various NLP tasks.

Generality.

To explore whether we can generally find highly-predictive skill neurons for various tasks, we apply the skill-neuron-finding method in § 3 to NLP tasks introduced in § 2.3. The performances of the top-predictivity found skill neurons and prompt tuning are shown in Table 1. For all the tasks, we can find skill neurons achieving comparable performance to prompt tuning, which demonstrates specific skill neurons generally exist in pre-trained Transformers for various tasks.

Figure 3: Histogram of neuron’s predictivity for IMDB. Error bars indicate s.e.m. over random trials.

Stability.

To rule out the possibility that the skill neurons are just from randomness and confirm the stability of this phenomenon, we conduct random trails (with different data orders and prompt initializations) to find skill neurons for all the tasks. Figure 3 shows the distributions of neuron predictivities within RoBERTa for SST-2 task. Distributions for the other tasks are left in appendix C. We can see that our method can stably find substantial skill neurons with high predictivities via prompts. Previous methods use average (Dai et al., 2021) and maximum (Suau et al., 2020) activations on input tokens instead of activations on prompts to find selective neurons, which are shown as the “Avg.” and “Max.” results in Figure 3, respectively. The experimental results indicate that previous methods hardly find highly-predictive neurons, which suggests that prompt tuning is crucial for finding skill neurons. We encourage future work to explore the reason why prompt tuning can help in this.

4.2 Skill Neurons are Crucial for Handling Tasks

Figure 4: Accuracy on Tweet drops along with the neuron perturbation rate. Error bars indicate s.e.m. over random trials. The perturbations are conducted in descending orders of neurons’ predictivities for different tasks or in random order (the “Random” curve).

A natural hypothesis is that if the skill neurons really encode skills, they shall be more important for PLMs to handle various tasks. To verify this, we perturb the skill neurons and see whether PLM’s performance drops more than perturbing random neurons. Specifically, the perturbation is to add a Gaussian noise ( and ) into the neurons’ activations (Arora et al., 2018), so that the neurons cannot function properly, and then we observe the PLM’s prompt tuning performances.

The perturbation results on Tweet task are shown in Figure 4, from which we observe that when we perturb top skill neurons of this task, the PLM’s performance drops much more significantly than when we perturb neurons in random order. It indicates that the highly-predictive skill neurons are indeed crucial for handling tasks and supports that skill neurons encode skills. Perturbation results on the other tasks are shown in § D.1, and they all exhibit similar phenomena.

4.3 Skill Neurons are Task-specific

We further study whether skill neurons are task-specific, i.e., do skill neurons encode task-specific high-level skills like distinguishing sentiments for sentiment analysis, or do they just encode some task-general low-level skills like recognizing parts of speech, which are also helpful for handling tasks.

First, if skill neurons are task-specific, we shall find similar skill neurons for similar tasks. To verify this, we rank neurons in descending orders of their predictivities for different tasks and see Spearman’s rank correlations (Spearman, 1987) between the orders of different tasks. The average results over all the layers of RoBERTa are shown in Figure 5. We can see that the correlations between similar tasks of the same type are obviously higher, which confirms that similar tasks have similar skill neurons. The layer-wise correlations are shown in appendix C, from which we can see skill neurons tend to be more task-specific in higher layers, which is consistent with previous probing findings (Liu et al., 2019a).

Figure 5: Spearman’s rank correlations between the neuron predictivity orders of different tasks. Results are averaged over all the layers.

Moreover, if skill neurons are task-specific, the skill neurons of same-type tasks shall be more important for handling a specific task. This has been supported by Figure 4, which shows that the accuracy on Tweet drops much more significantly when we perturb neurons in the predictivity orders of same-type tasks (SST-2, IMDB). To qualify this effect and comprehensively show this phenomenon in all tasks, we define the neuronal importance of a source task to an evaluation task as the area between the accuracy curves obtained by perturbing neurons in the predictivity order of the source task and in random order. For instance, in Figure 4, the neuronal importance of SST-2 to Tweet is the area between the blue curve and the gray curve. The overall neuronal importance is shown in Figure 6, from which we can see the skill neurons of same-type tasks are obviously more important, which strongly supports that the found skill neurons encode task-specific skills again.

4.4 Skill Neurons are not from Word Selectivity

Previous works (Dai et al., 2021; Suau et al., 2020) show that neurons in Transformers may selectively activate on some words or concepts. To confirm that skill neurons encode skills, we show that skill neurons are not from these selectivities.

Figure 6: Neuronal importances of different task pairs. Results are averaged over

random trials. For an evaluation task, the neuronal importances of different source tasks are normalized as z-scores.

max width=1 Cosine Similarity Top AGES, GES, ITIES, ause, UNCH, AGE, ORK, STE, TING, FE Bottom sham, Nicol, bogus, Rox, Nay, contro, guy, uneven, arbitrarily, unnatural Average Activation Top starters, village, oster, iddled, af, mafia, aley, tired, dep, ophobic Bottom official, repression, illegal, called, ensible, regime, abusers, should, creation, refuse

Table 2: Related words for SST-2’s top skill neuron.

We first do case studies on the related words of the top skill neurons, including the words with top and bottom cosine similarities between their input embeddings and the neuron weight vectors (Dai et al., 2021), and the words with top and bottom average activations (Suau et al., 2020). The results of SST-2 are shown in Table 2. We can see these related words do not convey sentiments, which demonstrates the skill neurons are not from keyword selectivities. Results of the other tasks are shown in appendix F.

Furthermore, considering the prompt tuning method does predictions by decoding label tokens, we need to check whether skill neurons depend on the label words used. If so, it indicates that the skill neurons do not encode the skills for handling tasks but encode the skills for selectively decoding some words. We rule out this possibility by finding that if we use different random words as label words, the achieved predictivity orders of neurons are pretty consistent. Specifically, for all the tasks, the average Spearman’s correlation between the neuron predictivity orders of random label words is 0.87.

5 Where do Skill Neurons Come from?

In § 4, we confirm that skill neurons do encode task-specific skills. Then a natural question is where skill neurons come from, i.e., do skill neurons acquire these skills in pre-training or prompt tuning? We find that skill neurons are most likely generated in pre-training with empirical evidence.

Task
Random
Guess
Random
Model
Random
Prompt
Hard
Prompt
SST-2
IMDB
Tweet
MNLI
QNLI
AG News
DBpedia
Table 3: Accuracies (%) on various tasks of top skill neurons found with random prompts and untuned hard prompts, compared to random guess and random model. We also report standard deviations over random trials.

We first try to find skill neurons with tuning-free prompts, including random prompts, which are randomly generated embeddings, and human-written hard prompts. The predictivities of the found neurons are shown in Table 3. We can see that even without tuning, we can still find neurons with non-trivial predictivities. Malach et al. (2020)

shows that randomly initialized neural networks may have predictive subnetworks. Hence we also compare with randomly initialized models using random prompts. It can be observed that the neurons in random models are predictive to some extent, but their predictivities are far below the neurons in pre-trained models. These results imply that the skill neurons are generated in pre-training, and prompt tuning only serves as an effective tool to observe the specificity of these neurons.

Figure 7: BitFit accuracy on IMDB drops along with the neuron perturbation rate. Error bars indicate s.e.m. over random trials. The perturbations are conducted in predictivity orders obtained with prompt tuning.
Figure 8: Average neuronal importance over models trained with adapter-based tuning and BitFit.

To provide stronger evidence, we explore whether the skill neurons found with prompt tuning are also important for other fine-tuning methods with different dynamics. We explore two parameter-efficient fine-tuning methods, including adapter-based tuning (Houlsby et al., 2019), which only tunes the additional adapter layers plugged in Transformers, and BitFit (Ben-Zaken et al., 2022)

, which only tunes the bias vectors. The two tuning methods both keep neuron weights fixed, which ensures that the skill neurons are unchanged during tuning. BitFit model’s performances on

IMDB when neurons are perturbed in the descending orders of predictivities obtained with prompts are shown in Figure 7, and the results for other tasks and adapter models are shown in appendix D. We can see the highly-predictive skill neurons found with prompts are still crucial for models fine-tuned with other methods. To comprehensively show this effect, similar to § 4.3, we visualize the average neuronal importance over models trained with adapter-based tuning and BitFit in Figure 8. The skill neurons found with prompt tuning also exhibit task-specific importance, which again supports that skill neurons are generated in pre-training rather than manufactured by prompt tuning.

6 Application

We further explore the applications of our skill neuron finding. We show two preliminary use cases: network pruning and transferability indicator.

6.1 Network Pruning

First, we apply our skill neuron finding to network pruning (Anwar et al., 2017; Dalvi et al., 2020), which is to reduce memory cost and accelerate inference by removing redundant parameters in neural networks. Existing works have explored prune PLMs with weight magnitude (Han et al., 2015; Gordon et al., 2020) and loss attribution (Michel et al., 2019). Here we explore prune PLMs by only keeping the top skill neurons active for each task and set the activations of the frozen neurons always as their baseline activations. Considering that the frozen neurons are fixed, we merge them into bias terms. We apply this pruning method to the top layers of RoBERTa and reduce it to of its original parameters. The performances of prompt tuning on pruned models and vanilla prompt tuning on the original model are shown in Table 4. Our pruning based on skill neurons generally performs comparably to vanilla prompt tuning and can achieve about inference speedup.

Task
Prompt
Tuning
Pruned
Model
Speedup
SST-2
IMDB
Tweet
MNLI
QNLI
AG News
DBpedia
Table 4: Accuracies (%) on various tasks of vanilla prompt tuning and prompt tuning on pruned models, along with standard deviations over random trials. We also report the achieved inference speedups on the tasks. Speedups are evaluated on a single CPU since it is widely used for model inference (Mittal et al., 2021).

6.2 Transferability Indicator

Previous works (Su et al., 2021; Vu et al., 2021) explore improving prompt tuning with cross-task prompt transfer. Su et al. (2021) propose that the overlapping rate of activated neurons () between soft prompts can serve as a prompt transferability indicator, which has good correlations with zero-shot prompt transferability and can help to qualify task similarities and improve prompt transfer. Su et al. (2021) take all neurons into calculation, but the redundant neurons without task-specific skills may bring noisy signals. Here we only take the top skill neurons of target tasks into the calculation. This improves the average Spearman’s correlation between and prompt transferability over our tasks from to .

7 Related Work

Selective Neurons in Artificial Neural Networks

There have long been findings about selective neurons in artificial neural networks. Many computer vision works 

(Coates et al., 2012; Le et al., 2013; Zeiler and Fergus, 2014; Agrawal et al., 2014; Zhou et al., 2015; Bau et al., 2020) find that both supervised and unsupervised models can have units selectively respond to specific visual objects and concepts. Radford et al. (2017)

also find neurons corresponding to sentiments in unsupervised long short-term memory networks. Interestingly, there are similar selective neurons in human brains 

(Barlow, 1972; Quiroga et al., 2005). The widespread emergence of these neuronal selectivities implies that there may be common learning mechanisms among intelligent systems, which is extremely worthwhile to explore in the future.

Bau et al. (2017) and Mu and Andreas (2020) find that selective neurons are more important, which is consistent with our findings. However, Morcos et al. (2018) draw opposite conclusions. We discuss this with experiments in appendix H.

Analyzing Pre-trained Transformers

After the success of Transformer-based PLMs (Devlin et al., 2019; Yang et al., 2019; Raffel et al., 2020), many efforts have been devoted to analyzing how PLMs work, such as probing the knowledge of PLMs (Liu et al., 2019a; Hewitt and Manning, 2019; Petroni et al., 2019) and understanding the behaviors of PLMs’ parameters (Voita et al., 2019; Clark et al., 2019). Among these, some works (Dalvi et al., 2019; Durrani et al., 2020; Antverg and Belinkov, 2022) find that individual neurons capture linguistic properties, but they define neurons as dimensions in contextualized representations. Other works (Suau et al., 2020; Geva et al., 2021; Dai et al., 2021) study the same group of neurons as us and find that some neurons encode specific information like concepts, facts, and word patterns. Inspired by them, we study whether neurons encode high-level skills for handling tasks in this work and demonstrate that we can observe skill neurons with the help of prompts. We believe it is promising to explore whether and how skill neurons collaborate with the neurons encoding information in future works.

8 Conclusion and Future Work

In this paper, we find some special neurons in pre-trained Transformers whose activations on soft prompts are highly predictive of the task labels of inputs. We dub these neurons skill neurons and develop a method to find them via prompt tuning. With extensive experiments, we confirm that skill neurons encode task-specific skills required to handle these tasks and find empirical evidence showing that skill neurons are most likely generated in pre-training rather than fine-tuning. We also demonstrate some practical applications of our skill neuron finding. In the future, we will extend our prompt-based skill neuron finding method to more scenarios, such as covering non-classification tasks and other parameters in Transformers like attention heads. We will also explore more fundamental problems about skill neurons and the working mechanisms of PLMs, including how the skill neurons emerge in pre-training, as well as the relationships between skill neurons and neurons encoding specific information found in previous works.

Limitations

Although we conducted extensive experiments, the exploration scope of this work has some limitations: (1) The experimental analyses are all based on RoBERTa. Whether the skill neuron phenomenon widely exists for other Transformer-based pre-trained language models is unclear and more explorations are needed to verify it. (2) The datasets used in our experiments are all English, which limits the linguistic features covered in our analyses, and the evaluation tasks are limited to classification tasks. We choose English just because of its rich resource. Although we intuitively believe the observed phenomena are not dependent on the English language, experiments on more diverse languages are needed in future works. (3) Following previous works (Geva et al., 2021; Dai et al., 2021), the analyzed neurons in our work all distribute in the feed-forward layers of Transformers. Deeper analyses may require considering other parameters like the attention heads. We encourage future works to address these limitations and get more comprehensive analysis results.

Acknowledgements

This work is supported by the New Generation Artificial Intelligence of China (2020AAA0106501), the Institute for Guo Qiang, Tsinghua University (2019GQB0003), and Huawei Noah’s Ark Lab. We thank anonymous reviewers for their suggestions.

References

  • P. Agrawal, R. B. Girshick, and J. Malik (2014) Analyzing the performance of multilayer neural networks for object recognition. In Proceedings of ECCV, pp. 329–344. External Links: Link Cited by: §7.
  • O. Antverg and Y. Belinkov (2022) On the pitfalls of analyzing individual neurons in language models. In Proceedings of ICLR, External Links: Link Cited by: Appendix G, §2.2, §7.
  • S. Anwar, K. Hwang, and W. Sung (2017)

    Structured pruning of deep convolutional neural networks

    .
    ACM Journal on Emerging Technologies in Computing Systems (JETC) 13 (3), pp. 1–18. External Links: Document Cited by: §1, §6.1.
  • S. Arora, R. Ge, B. Neyshabur, and Y. Zhang (2018) Stronger generalization bounds for deep nets via a compression approach. In Proceedings of ICML, pp. 254–263. External Links: Link Cited by: §4.2.
  • S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives (2007) DBpedia: a nucleus for a web of open data. In Proceedings of ISWC/ASWC, pp. 722–735. External Links: Document Cited by: §A.3.
  • F. Barbieri, J. Camacho-Collados, L. Espinosa Anke, and L. Neves (2020) TweetEval: unified benchmark and comparative evaluation for tweet classification. In Findings of EMNLP, pp. 1644–1650. External Links: Document, Link Cited by: §A.1, §2.3.
  • H. B. Barlow (1972) Single units and sensation: A neuron doctrine for perceptual psychology?. Perception 1 (4), pp. 371–394. External Links: Document Cited by: §7.
  • A. Bau, Y. Belinkov, H. Sajjad, N. Durrani, F. Dalvi, and J. Glass (2018)

    Identifying and controlling important neurons in neural machine translation

    .
    In Proceedings of ICLR, External Links: Link Cited by: §1.
  • D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba (2017) Network dissection: quantifying interpretability of deep visual representations. Proceedings of CVPR, pp. 3319–3327. External Links: Link Cited by: Appendix H, §7.
  • D. Bau, J. Zhu, H. Strobelt, A. Lapedriza, B. Zhou, and A. Torralba (2020) Understanding the role of individual units in a deep neural network. Proceedings of the National Academy of Sciences 117 (48), pp. 30071–30078. External Links: Document Cited by: §1, §7.
  • E. Ben-Zaken, S. Ravfogel, and Y. Goldberg (2022) BitFit: simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of ACL, External Links: Link Cited by: §1, §5.
  • T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei (2020) Language models are few-shot learners. In Proceedings of NeurIPS, pp. 1877–1901. External Links: Link Cited by: §2.2.
  • K. Clark, U. Khandelwal, O. Levy, and C. D. Manning (2019) What does BERT look at? an analysis of BERT’s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 276–286. External Links: Link, Document Cited by: §7.
  • A. Coates, A. Karpathy, and A. Ng (2012) Emergence of object-selective features in unsupervised feature learning. In Proceedings of NeurIPS, pp. 2681–2689. External Links: Link Cited by: §7.
  • D. Dai, L. Dong, Y. Hao, Z. Sui, and F. Wei (2021) Knowledge neurons in pretrained transformers. arXiv preprint, arXiv:2104.08696. External Links: Link Cited by: Appendix G, §2.2, §2.2, §3.1, §4.1, §4.4, §4.4, §7, Limitations.
  • F. Dalvi, N. Durrani, H. Sajjad, Y. Belinkov, A. Bau, and J. Glass (2019) What is one grain of sand in the desert? analyzing individual neurons in deep nlp models. In Proceedings of AAAI, pp. 6309–6317. External Links: Document Cited by: Appendix G, §2.2, §7.
  • F. Dalvi, H. Sajjad, N. Durrani, and Y. Belinkov (2020) Analyzing redundancy in pretrained transformer models. In Proceedings of EMNLP, pp. 4908–4926. External Links: Link Cited by: §1, §1, §6.1.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pp. 4171–4186. External Links: Document Cited by: §2.1, §2.2, §2.3, §7.
  • Y. Dong, J. Cordonnier, and A. Loukas (2021) Attention is not all you need: pure attention loses rank doubly exponentially with depth. In Proceedings of ICML, pp. 2793–2803. External Links: Link Cited by: §2.2.
  • N. Durrani, H. Sajjad, F. Dalvi, and Y. Belinkov (2020) Analyzing individual neurons in pre-trained language models. In Proceedings of EMNLP, pp. 4865–4880. External Links: Link Cited by: Appendix G, §2.2, §7.
  • M. Geva, R. Schuster, J. Berant, and O. Levy (2021) Transformer feed-forward layers are key-value memories. In Proceedings of EMNLP, pp. 5484–5495. External Links: Link, Document Cited by: Appendix G, §2.2, §3.1, §7, Limitations.
  • M. A. Gordon, K. Duh, and N. Andrews (2020)

    Compressing bert: studying the effects of weight pruning on transfer learning

    .
    arXiv preprint arXiv:2002.08307. External Links: Link Cited by: §6.1.
  • S. Han, J. Pool, J. Tran, and W. Dally (2015) Learning both weights and connections for efficient neural network. In Proceedings of NeurIPS, pp. 1135–1143. External Links: Link Cited by: §6.1.
  • X. Han, Z. Zhang, N. Ding, Y. Gu, X. Liu, Y. Huo, J. Qiu, L. Zhang, W. Han, M. Huang, et al. (2021) Pre-trained models: past, present and future. AI Open, pp. 225–250. External Links: Document Cited by: §1.
  • L. T. Hennigen, A. Williams, and R. Cotterell (2020) Intrinsic probing through dimension selection. In Proceedings of EMNLP, pp. 197–216. External Links: Link Cited by: §2.2.
  • J. Hewitt and C. D. Manning (2019) A structural probe for finding syntax in word representations. In Proceedings of NACCL-HLT, pp. 4129–4138. External Links: Document Cited by: §7.
  • N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly (2019) Parameter-efficient transfer learning for NLP. In Proceedings of ICML, pp. 2790–2799. External Links: Link Cited by: §1, §5.
  • Z. Jiang, F. F. Xu, J. Araki, and G. Neubig (2020) How can we know what language models know?. Transactions of the Association for Computational Linguistics 8, pp. 423–438. External Links: Document, Link Cited by: §2.1.
  • A. Karpathy, J. Johnson, and L. Fei-Fei (2015) Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078, pp. 818–833. External Links: Link Cited by: §1.
  • D. P. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. In Proceedings of ICLR, External Links: Link Cited by: Appendix B.
  • Q. V. Le, M. Ranzato, R. Monga, M. Devin, G. S. Corrado, K. Chen, J. Dean, and A. Ng (2013)

    Building high-level features using large scale unsupervised learning

    .
    In Proceedings of ICASSP, pp. 8595–8598. External Links: Document Cited by: §7.
  • B. Lester, R. Al-Rfou, and N. Constant (2021) The power of scale for parameter-efficient prompt tuning. In Proceedings of EMNLP, pp. 3045–3059. External Links: Link, Document Cited by: §1, §2.1.
  • Q. Lhoest, A. Villanova del Moral, Y. Jernite, A. Thakur, P. von Platen, S. Patil, J. Chaumond, M. Drame, J. Plu, L. Tunstall, J. Davison, M. Šaško, G. Chhablani, B. Malik, S. Brandeis, T. Le Scao, V. Sanh, C. Xu, N. Patry, A. McMillan-Major, P. Schmid, S. Gugger, C. Delangue, T. Matussière, L. Debut, S. Bekman, P. Cistac, T. Goehringer, V. Mustar, F. Lagunas, A. Rush, and T. Wolf (2021) Datasets: a community library for natural language processing. In Proceedings of EMNLP, pp. 175–184. External Links: Link Cited by: §A.3.
  • X. L. Li and P. Liang (2021) Prefix-tuning: optimizing continuous prompts for generation. In Proceedings of ACL, pp. 4582–4597. External Links: Document, Link Cited by: §1, §2.1.
  • N. F. Liu, M. Gardner, Y. Belinkov, M. E. Peters, and N. A. Smith (2019a) Linguistic knowledge and transferability of contextual representations. In Proceedings of NAACL-HLT, pp. 1073–1094. External Links: Document, Link Cited by: Appendix E, §4.3, §7.
  • X. Liu, K. Ji, Y. Fu, Z. Du, Z. Yang, and J. Tang (2022) P-Tuning v2: prompt tuning can be comparable to fine-tuning universally across scales and tasks. In Proceedings of ACL, External Links: Link Cited by: §1.
  • Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov (2019b) RoBERTa: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907,11692. External Links: Link Cited by: Appendix B, §1, §2.2, §2.3.
  • A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts (2011) Learning word vectors for sentiment analysis. In Proceedings of ACL-HLT, pp. 142–150. External Links: Link Cited by: §A.1, §2.3.
  • E. Malach, G. Yehudai, S. Shalev-shwartz, and O. Shamir (2020) Proving the lottery ticket hypothesis: pruning is all you need. In Proceedings of ICML, Cited by: §5.
  • P. Michel, O. Levy, and G. Neubig (2019) Are sixteen heads really better than one?. In Proceedings of NeurIPS, pp. 14014–14024. External Links: Link Cited by: Appendix G, §6.1.
  • E. Mitchell, C. Lin, A. Bosselut, C. Finn, and C. D. Manning (2021) Fast model editing at scale. In Proceedings of ICLR, External Links: Link Cited by: §1.
  • S. Mittal, P. Rajput, and S. Subramoney (2021)

    A survey of deep learning on CPUs: opportunities and co-optimizations

    .
    IEEE Transactions on Neural Networks and Learning Systems, pp. 1–21. External Links: Document Cited by: Table 4.
  • A. S. Morcos, D. G. Barrett, N. C. Rabinowitz, and M. Botvinick (2018) On the importance of single directions for generalization. In Proceedings of ICLR, External Links: Link Cited by: Appendix H, Figure 15, Appendix H, Appendix H, §7.
  • J. Mu and J. Andreas (2020) Compositional explanations of neurons. In Proceedings of NeurIPS, pp. 17153–17163. External Links: Link Cited by: Appendix H, §7.
  • F. Petroni, T. Rocktäschel, S. Riedel, P. Lewis, A. Bakhtin, Y. Wu, and A. Miller (2019) Language models as knowledge bases?. In Proceedings of EMNLP-IJCNLP, pp. 2463–2473. External Links: Document, Link Cited by: §2.1, §7.
  • O. Press, N. A. Smith, and O. Levy (2020) Improving transformer models by reordering their sublayers. In Proceedings of ACL, pp. 2996–3005. External Links: Link, Document Cited by: §2.2.
  • G. Qin and J. Eisner (2021) Learning how to ask: querying LMs with mixtures of soft prompts. In Proceedings of NAACL-HLT, pp. 5203–5212. External Links: Document, Link Cited by: §2.1, §2.1.
  • R. Q. Quiroga, L. Reddy, G. Kreiman, C. Koch, and I. Fried (2005) Invariant visual representation by single neurons in the human brain. Nature 435 (7045), pp. 1102–1107. External Links: Document Cited by: §7.
  • A. Radford, R. Józefowicz, and I. Sutskever (2017) Learning to generate reviews and discovering sentiment. arXiv preprint arXiv:1704.01444. External Links: Link Cited by: §7.
  • C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu (2020) Exploring the limits of transfer learning with a unified text-to-text transformer.

    Journal of Machine Learning Research

    21, pp. 1–67.
    External Links: Link Cited by: §2.2, §7.
  • S. Rosenthal, N. Farra, and P. Nakov (2017) SemEval-2017 task 4: sentiment analysis in Twitter. In Proceedings of SemEval, pp. 502–518. External Links: Link, Document Cited by: §A.1.
  • B. Rudy, G. Fishell, S. Lee, and J. Hjerling-Leffler (2011) Three groups of interneurons account for nearly 100% of neocortical gabaergic neurons. Developmental neurobiology 71 (1), pp. 45–61. External Links: Document Cited by: §3.1.
  • T. Schick and H. Schütze (2021) Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of EACL, pp. 255–269. External Links: Link Cited by: §2.1.
  • R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng, and C. Potts (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of EMNLP, pp. 1631–1642. External Links: Link Cited by: §A.1, §1, §2.3.
  • C. Spearman (1987) The proof and measurement of association between two things. In Proceedings of AJP, pp. 441–471. External Links: Link Cited by: §4.3.
  • Y. Su, X. Wang, Y. Qin, C. Chan, Y. Lin, Z. Liu, P. Li, J. Li, L. Hou, M. Sun, et al. (2021) On transferability of prompt tuning for natural language understanding. arXiv preprint arXiv:2111.06719. External Links: Link Cited by: §1, §1, §2.1, §6.2.
  • X. Suau, L. Zappella, and N. Apostoloff (2020) Finding experts in transformer models. arXiv preprint arXiv:2005.07647. External Links: Link Cited by: Appendix G, §1, §2.2, §4.1, §4.4, §4.4, §7.
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Proceedings of NeurIPS, pp. 5998–6008. External Links: Link Cited by: §1, §2.2.
  • E. Voita, D. Talbot, F. Moiseev, R. Sennrich, and I. Titov (2019) Analyzing multi-head self-attention: specialized heads do the heavy lifting, the rest can be pruned. In Proceedings of NAACL, pp. 5797–5808. External Links: Link, Document Cited by: §7.
  • T. Vu, B. Lester, N. Constant, R. Al-Rfou, and D. Cer (2021) SPoT: better frozen model adaptation through soft prompt transfer. arXiv preprint arxiv:2110.07904. External Links: Link Cited by: §6.2.
  • A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman (2019) GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of ICLR, External Links: Link Cited by: §A.2, §A.3, §2.3.
  • A. Williams, N. Nangia, and S. Bowman (2018) A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of NAACL-HLT, pp. 1112–1122. External Links: Document, Link Cited by: §A.2, §2.3, §3.2.
  • T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, et al. (2020) Transformers: state-of-the-art natural language processing. In Proceedings of EMNLP, pp. 38–45. External Links: Link Cited by: Appendix B.
  • Z. Yang, Z. Dai, Y. Yang, J. G. Carbonell, R. Salakhutdinov, and Q. V. Le (2019) XLNet: generalized autoregressive pretraining for language understanding. In Proceedings of NeurIPS, pp. 5754–5764. External Links: Link Cited by: §7.
  • M. D. Zeiler and R. Fergus (2014) Visualizing and understanding convolutional networks. In Proceedings of ECCV, External Links: Link Cited by: §1, §7.
  • X. Zhang, J. Zhao, and Y. LeCun (2015) Character-level convolutional networks for text classification. In Proceedings of NeurIPS, pp. 649–657. External Links: Link Cited by: §A.3, §A.3, §2.3.
  • Z. Zhang, Y. Lin, Z. Liu, P. Li, M. Sun, and J. Zhou (2021) MoEfication: conditional computation of transformer models for efficient inference. arXiv preprint arXiv:2110.01786. External Links: Link Cited by: §1, §2.2.
  • Z. Zhong, D. Friedman, and D. Chen (2021) Factual probing is [MASK]: learning vs. learning to recall. In Proceedings of NAACL, pp. 5017–5033. External Links: Document, Link Cited by: §2.1, §2.1.
  • B. Zhou, A. Khosla, À. Lapedriza, A. Oliva, and A. Torralba (2015) Object detectors emerge in deep scene cnns. In Proceedings of ICLR, External Links: Link Cited by: §7.

Appendices

Appendix A Details about Investigated Tasks

In experiments, we use established public English NLP datasets, which are licensed and intended for research use. These datasets are all created with public texts, and we believe they do not involve personal information and are well anonymized. The details about the datasets are as follows:

a.1 Sentiment Analysis

SST-2 (Socher et al., 2013) requires to classify the sentiments expressed in movie reviews into Positive and Negative sentiments.

IMDB (Maas et al., 2011) requires to classify the sentiments expressed in reviews from the Internet Movie Database222https://www.imdb.com into Positive and Negative sentiments.

TweetEval (Barbieri et al., 2020) is a collection of Twitter-specific classification tasks. Here we use its sentiment analysis subtask, which is originally from SemEval 2017 Task 4 (Rosenthal et al., 2017). It requires to recognize if a tweet is Positive, Negative or Neutral. We decompose it to two subtasks: Positive vs. Negative, and Neural vs. Non-neutral.

a.2 Natural Language Inference

MNLI (Williams et al., 2018) requires to recognize the relationship between sentence pairs as Entailment, Neutral and Contradiction. We decompose it to two subtasks: Entailment vs. Contradiction, and Neural vs. Non-neutral.

QNLI (Wang et al., 2019) requires to classify whether a context sentence contains the answer to a question.

a.3 Topic Classification

AG News (Zhang et al., 2015) requires to classify the topics of news articles in the AG’s corpus333http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html.

DBpedia (Zhang et al., 2015) requires to classify the topics of articles in DBpedia (Auer et al., 2007).

Since recognizing different topics requires essentially different skills, we use the only two similar labels of the two tasks. They are Business and Sports in AG News, and Company and Athlete in DBpedia.

We obtain the datasets from Huggingface’s dataset platform (Lhoest et al., 2021). For the datasets included in the GLUE collection (Wang et al., 2019), since we cannot get their test set, we use the released validation set as our test set, random samples from the original training set as our training set, and the other samples as our validation set. The detailed data statistics are shown in Table 5.

Task Training Validation Test
SST-2
IMDB
Tweet
MNLI
QNLI
AG News
DBpedia
Table 5: Data statistics of the used datasets.

Appendix B Implementations Details

We implement the prompt tuning method introduced in § 2.1 with

soft prompts. We randomly initialize each soft prompt using a normal distribution with the standard deviation as

. We then train the model using Adam (Kingma and Ba, 2015) as the optimizer. We set the learning rate as and the batch size as . We do the evaluation on the validation set every iterations and early stop the training if the validation accuracy does not rise for times. We use label words Negative, Positive for binary classification tasks and Negative, Neutral, Positive for multi-class classification tasks. For the random label words experiment in § 4.4, we uniformly sample the label words from the vocabulary of RoBERTa (Liu et al., 2019b).

We conduct all experiments on RoBERTa model, which has M parameters, and we use Huggingface’s Transformers library (Wolf et al., 2020) to implement the experiments. We run the experiments on NVIDIA GeForce RTX 2080 Ti and NVIDIA GeForce RTX 3090 GPUs, and it takes about GPU hours.

Appendix C More Predictivity Distributions

We report the predictivity distribution for IMDB in § 4.1 and show the distributions for the other binary classification tasks in Figure 9. We can see our method can stably find many highly-predictive skill neurons for all the tasks. For the multi-class classification tasks, since the predictivities are for decomposed subtasks, we cannot draw distributions for the original tasks and do not include them in the results here.

(a) SST-2
(b) QNLI
(c) DBpedia
(d) AG News
Figure 9: Histograms of predictivity for various tasks on neurons within RoBERTa. Error bars indicate s.e.m. over random trials.
Figure 10: Histogram of neuron’s predictivity in different definitions for SST-2. Error bars indicate s.e.m. over random trials.

Appendix D More Neuron Perturbation Results

Here we demonstrate more neuron perturbation experimental results.

d.1 Performance Dropping Trends for Prompt Tuning

In Figure 4, we show the performance dropping trend on Tweet task. The results on the other tasks are shown in Figure 11.

(a) On SST-2
(b) On IMDB
(c) On MNLI
(d) On QNLI
(e) On AG News
(f) On DBpedia
Figure 11: Accuracies on various tasks drop along with the neuron perturbation rates. Error bars indicate s.e.m. over random trials. The perturbations are conducted in descending orders of neurons’ predictivities for different tasks or in random order (the “Random” curve).

d.2 Performance Dropping Trends for Adapter-based Tuning

The performance dropping trends of adapter-based tuning models on various tasks are shown in Figure 12.

(a) On SST-2
(b) On IMDB
(c) On Tweet
(d) On MNLI
(e) On QNLI
(f) On AG News
(g) On DBpedia
Figure 12: Adapter-based tuning accuracies on various tasks drop along with the neuron perturbation rates. Error bars indicate s.e.m. over random trials. The perturbations are conducted in predictivity orders obtained with prompt tuning.

d.3 Performance Dropping Trends for BitFit

The performance dropping trends of BitFit models on various tasks are shown in Figure 13.

(a) On SST-2
(b) On Tweet
(c) On MNLI
(d) On QNLI
(e) On AG News
(f) On DBpedia
Figure 13: BitFit accuracies on various tasks drop along with the neuron perturbation rates. Error bars indicate s.e.m. over random trials. The perturbations are conducted in predictivity orders obtained with prompt tuning.

Appendix E Layer-wise Correlations between Neuron Predictivity Orders of Different Tasks

Figure 5 shows the overall Spearman’s rank correlations between the neuron predictivity orders of different tasks, which is averaged over the layers of RoBERTa. Here we further present the layer-wise correlations in Figure 14, from which we can see the skill neurons are more and more task-specific from the bottom layer to the top layer, which is consistent with the probing findings (Liu et al., 2019a) showing that PLMs tend to learn general skills in the lower layers and learn specific skills in the higher layers. These results suggest that our neuron-finding method can find both neurons encoding general skills in the lower layers and neurons encoding specific skills in the lower layers, but the found top skill neurons are task-specific in general (Figure 5). In this work, we focus on the task-specific top skill neurons and leave careful study for the neurons encoding general skills in future work.

(a) Layer
(b) Layer
(c) Layer
(d) Layer
(e) Layer
(f) Layer
(g) Layer
(h) Layer
(i) Layer
(j) Layer
(k) Layer
(l) Layer
Figure 14: Spearman’s rank correlations between the neuron predictivity orders of different tasks on different layers. Layer is the bottom layer near the inputs, and layer is the top layer near the outputs.

Appendix F More Word Selectivity Results

In Table 2, we show the related words for SST-2. Here we further show the results for the other tasks in Table 6. We can see these related words generally do not convey clues about solving the tasks.

max width=1

IMDB Cosine Similarity
Top legged, turnout, ladder, heid, flexible, Quite, contrary, runs, Reference, enqu
Bottom qq, qa, Capture, Import, Tripoli, hereby, eus, ,, rip, Lima
Average Activation
Top success, Kund, Sanctuary, Lim, Wave, dele, Crystal, flung, Kerala, .............
Bottom
vation, goodbye, concludes, bye, Congratulations,
Congratulations, Fare, farewell, BY, ceremony,
Tweet Cosine Similarity
Top atican, uras, isman, anan, Luck, Merit, Character, alth, atching, character,
Bottom Register, enzymes, elsen, Registrar, tasting, regist, soils, µ, Chambers, LINE,
Average Activation
Top dh, Titan, utable, exited, iOS, chel, loophole, acious, 520, Harmony,
Bottom spike, unbelievably, Toxic, prov, RIS, resulting, risks, rising, ues, reapp,
MNLI Cosine Similarity
Top
trigger, Pis, deadlines, Launch, mares,
PROGRAM, Congratulations, Success, Congratulations, Gig,
Bottom minim, xt, spoof, dism, avoid, asive, WN, offset, inter, antiqu,
Average Activation
Top nickel, grun, cluded, 91, handled, secure, very, dairy, gent, Roses,
Bottom ayed, disl, ect, wipes, screwed, resistance, aw, ruin, shrinking, spite,
QNLI Cosine Similarity
Top otyp, disemb, sidel, melanch, unint, outwe, umbnails, precedence, unfl, Sym,
Bottom 314, 223, 313, 234, ,, 316, 341, 463, 238, 261,
Average Activation
Top eds, adding, apocalypse, strawberry, apopt, Kid, leaf, Silent, technical,
Bottom entrepreneurial, Econom, Columb, prime, roleum, Trade, rounded, isner, enz, 158,
AG News Cosine Similarity
Top aukee, erity, lambda, ropolitan, roxy, LAN, ylon, incinn, oslav, coni,
Bottom Gross, Villa, Uri, ende, Summary, Gallup, Temp, Rog, RP, Ram,
Average Activation
Top fight, desert, Merge, Mail, Mid, Rankings, istic, **, berries, Pen,
Bottom ETS, 107, Line, 106, observers, Ranked, EB, ido, Bass, alf,
DBpedia Cosine Similarity
Top ming, umbered, hind, utter, pepper, scr, increment, usher, empt, atmospheric,
Bottom Chron, kan, Div, Case, Thread, Role, Crash, Mode, Tank, Apps,
Average Activation
Top Bubble, mailed, Ari, razen, Perspective, ogical, Gin, Disney, icons, Huang,
Bottom Jacob, Boss, Dad, trough, Shiny, carn, Gravity, toolbar, Sword, temple,
Table 6: Related words for various tasks’ top skill neurons.

Appendix G Discussions on Neuron-Finding Design Choices

In this section, we discuss some potential other design choices that may be used in finding important skill neurons to provide more background about why we choose the method described in § 3 finally and inspire future works.

Perturbation-based neuron finding.

A natural way to define the importance of a neuron (to a task) is to perturb the neurons and see how they influence the predictions. The perturbation-based method has been used in previous analysis works (Michel et al., 2019), and we also adopt them in our analytical experiments. But we and many other neuron-level analysis works (Dalvi et al., 2019; Durrani et al., 2020; Antverg and Belinkov, 2022; Suau et al., 2020; Geva et al., 2021; Dai et al., 2021) cannot directly use this method to locate important neurons. This is because of the efficiency issue. Perturbing every individual neuron is unaffordable.

Is prompt tuning necessary?

This work starts from an interesting empirical finding, i.e., the skill neuron phenomenon. This finding is based on prompt tuning. In § 4 and Figure 3, we show that previous methods without prompt tuning cannot well locate the skill neurons. Since we focus on confirming the finding and exploring the properties of skill neurons, we conduct all the experiments based on prompt tuning and do not explore whether it is necessary. Intuitively, as our experiments suggest that the emergence of skill neurons does not depend on prompt tuning but is mostly an intrinsic property for pre-trained Transformer-based language models, we believe prompt tuning may not be the only way to locate skill neurons. We will explore other methods without prompt tuning in future works, which may bring some benefits, like improving overall efficiency.

Other ways to define neuron’s predictivity.

In § 3.1, we define the predictivity of a neuron (1) using the maximum over prompt tokens and (2) considering both the positive and negative correlations. These two choices are made with preliminary experiments. Figure 10

shows an example, from which we can see that when defining neuron’s predictivity using the mean values over prompt tokens or only considering the positive correlations, the predictivities will be significantly under-estimated than the default definition in

§ 3.1.

Appendix H Experiments following Morcos et al. (2018)

Some previous works (Bau et al., 2017; Mu and Andreas, 2020) suggest that selective neurons contribute more to model accuracies. In § 4, we also find that perturbing selective skill neurons leads to more performance drop. However, Morcos et al. (2018) draw opposite conclusions and find that selective and non-selective neurons are similarly important. These pose questions about why these conclusions are inconsistent.

We find that except for experimental setups, the main difference between Morcos et al. (2018) and ours lies in the definition of neuronal selectivity. Morcos et al. (2018) define a "selectivity index" and we use the predictivity score introduced in § 3. To check whether these different definitions lead to inconsistent results, we do experiments under our setup and also try to perturb neurons in descending orders of their “selectivity index”. The results are shown in Figure 15. We can see that when using the “selectivity index”, the found neurons are surely not more important than random neurons as reported by Morcos et al. (2018). But our predictivity metric can find significantly more important neurons for all the tasks.

(a) On SST-2
(b) On IMDB
(c) On Tweet
(d) On MNLI
(e) On QNLI
(f) On AG News
(g) On DBpedia
Figure 15: Prompt tuning accuracies on various tasks drop along with the neuron perturbation rates. Error bars indicate s.e.m. over random trials. The perturbations are conducted in descending predictivity orders (Ours), random orders (Random) and descending "selectivity index" (Morcos et al., 2018) orders (Selectivity Index).