Learning Horn Envelopes via Queries from Large Language Models

05/20/2023
by   Sophie Blum, et al.
0

We investigate an approach for extracting knowledge from trained neural networks based on Angluin's exact learning model with membership and equivalence queries to an oracle. In this approach, the oracle is a trained neural network. We consider Angluin's classical algorithm for learning Horn theories and study the necessary changes to make it applicable to learn from neural networks. In particular, we have to consider that trained neural networks may not behave as Horn oracles, meaning that their underlying target theory may not be Horn. We propose a new algorithm that aims at extracting the “tightest Horn approximation” of the target theory and that is guaranteed to terminate in exponential time (in the worst case) and in polynomial time if the target has polynomially many non-Horn examples. To showcase the applicability of the approach, we perform experiments on pre-trained language models and extract rules that expose occupation-based gender biases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2022

Extracting Rules from Neural Networks with Partial Interpretations

We investigate the problem of extracting rules, expressed in Horn logic,...
research
05/06/2020

On the Learnability of Possibilistic Theories

We investigate learnability of possibilistic theories from entailments i...
research
07/16/2018

Probably approximately correct learning of Horn envelopes from queries

We propose an algorithm for learning the Horn envelope of an arbitrary d...
research
03/10/2020

Cryptanalytic Extraction of Neural Network Models

We argue that the machine learning problem of model extraction is actual...
research
03/09/2021

BERTese: Learning to Speak to BERT

Large pre-trained language models have been shown to encode large amount...
research
11/21/2022

Measuring Harmful Representations in Scandinavian Language Models

Scandinavian countries are perceived as role-models when it comes to gen...
research
11/15/2020

Safety Synthesis Sans Specification

We define the problem of learning a transducer S from a target language ...

Please sign up or login with your details

Forgot password? Click here to reset