What shapes feature representations? Exploring datasets, architectures, and training

06/22/2020
by   Katherine L. Hermann, et al.
0

In naturalistic learning problems, a model's input contains a wide range of features, some useful for the task at hand, and others not. Of the useful features, which ones does the model use? Of the task-irrelevant features, which ones does the model represent? Answers to these questions are important for understanding the basis of models' decisions, for example to ensure they are equitable and unbiased, as well as for building new models that learn versatile, adaptable representations useful beyond their original training task. We study these questions using synthetic datasets in which the task-relevance of different input features can be controlled directly. We find that when two features redundantly predict the label, the model preferentially represents one, and its preference reflects what was most linearly decodable from the untrained model. Over training, task-relevant features are enhanced, and task-irrelevant features are partially suppressed. Interestingly, in some cases, an easier, weakly predictive feature can suppress a more strongly predictive, but harder one. Additionally, models trained to recognize both easy and hard features learn representations most similar to models that use only the easy feature. Further, easy features lead to more consistent representations across model runs than do hard features. Finally, models have more in common with an untrained model than with models trained on a different task. Our results highlight the complex processes that determine which features a model represents.

READ FULL TEXT

page 3

page 13

page 14

page 16

page 20

research
01/28/2023

Composing Task Knowledge with Modular Successor Feature Approximators

Recently, the Successor Features and Generalized Policy Improvement (SF ...
research
02/25/2021

Do Input Gradients Highlight Discriminative Features?

Interpretability methods that seek to explain instance-specific model pr...
research
02/15/2023

Feature-Enhanced Network with Hybrid Debiasing Strategies for Unbiased Learning to Rank

Unbiased learning to rank (ULTR) aims to mitigate various biases existin...
research
01/31/2023

Friend-training: Learning from Models of Different but Related Tasks

Current self-training methods such as standard self-training, co-trainin...
research
11/14/2022

Do Neural Networks Trained with Topological Features Learn Different Internal Representations?

There is a growing body of work that leverages features extracted via to...
research
05/25/2023

Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression

Contrastive learning (CL) has emerged as a powerful technique for repres...
research
08/06/2016

Transferring Knowledge from Text to Predict Disease Onset

In many domains such as medicine, training data is in short supply. In s...

Please sign up or login with your details

Forgot password? Click here to reset