Taming AI Bots: Controllability of Neural States in Large Language Models

05/29/2023
by   Stefano Soatto, et al.
0

We tackle the question of whether an agent can, by suitable choice of prompts, control an AI bot to any state. To that end, we first introduce a formal definition of “meaning” that is amenable to analysis. Then, we characterize “meaningful data” on which large language models (LLMs) are ostensibly trained, and “well-trained LLMs” through conditions that are largely met by today's LLMs. While a well-trained LLM constructs an embedding space of meanings that is Euclidean, meanings themselves do not form a vector (linear) subspace, but rather a quotient space within. We then characterize the subset of meanings that can be reached by the state of the LLMs for some input prompt, and show that a well-trained bot can reach any meaning albeit with small probability. We then introduce a stronger notion of controllability as almost certain reachability, and show that, when restricted to the space of meanings, an AI bot is controllable. We do so after introducing a functional characterization of attentive AI bots, and finally derive necessary and sufficient conditions for controllability. The fact that AI bots are controllable means that an adversary could steer them towards any state. However, the sampling process can be designed to counteract adverse actions and avoid reaching undesirable regions of state space before their boundary is crossed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2013

Giving the AI definition a form suitable for the engineer

Artificial Intelligence - what is this? That is the question! In earlier...
research
02/10/2018

Laplacian Dynamics on Cographs: Controllability Analysis through Joins and Unions

In this paper, we examine the controllability of Laplacian dynamic netwo...
research
08/01/2023

Beneficent Intelligence: A Capability Approach to Modeling Benefit, Assistance, and Associated Moral Failures through AI Systems

The prevailing discourse around AI ethics lacks the language and formali...
research
05/11/2018

State Gradients for RNN Memory Analysis

We present a framework for analyzing what the state in RNNs remembers fr...
research
05/05/2020

Stolen Probability: A Structural Weakness of Neural Language Models

Neural Network Language Models (NNLMs) generate probability distribution...
research
12/13/2019

Large deviations for the empirical measure of the zig-zag process

The zig-zag process is a piecewise deterministic Markov process in posit...
research
12/13/2019

Does AlphaGo actually play Go? Concerning the State Space of Artificial Intelligence

The overarching goal of this paper is to develop a general model of the ...

Please sign up or login with your details

Forgot password? Click here to reset