There is an open debate about Reinforcement Learning (RL) being a Causal problem or not. According to Sutton and Barto (1998), the RL problem is to learn some task in an interactive way, and the now standard solution consists in assigning values for the different states, or values, which certain stochastic proces can take. It has been argued that RL is essentially a control problem, and since in a RL problem agent performs interventions in a given environment, then this is a causal problem (Szepesvari (2018), Sutton et al. (1992)). RL can also help to understand human-learning processes and their causal interpretation (Gershman (2015)); in fact, since human beings conceive their actions as interventions in the world, and humans actively perceive their environment by predicting the outcomes of such interventions (Clark (2015)) it is tempting to consider RL as a causal problem in nature.
The objective of the most-common used RL algorithms is to find a policy, which is a map between states and actions, which is interpreted as what should a rational agent do if he finds himself in such state. RL, both in its formulation, its optimality criteria and the algorithms involved, use operations based on associative information, but do not make use of causal operations. Here we argue that RL and causal reasoning are inherently different problems, and by presenting an analogy in terms of algebraic structures we argue that once established the different algebras that associative and causal information induce we can not mix between them irrespective of the motivation or real-life situation that lies behind.
2 Levels of formulation of a problem
When talking about a problem one must be careful and distinguish between a real world situation and what mathematical formulation
of such situation. In the problem of learning by interaction, the intuition is of an intelligent agent manipulating his environment and learning from the consequences of his actions via a reward function. The standard formulation of such problems is through a Markov Decision Process, or some variants of it. Anoptimal policy is what the scientific comunity has accepted to be the solution of mathematical problem generated the learning by interaction problem, and several algorithms have been proposed to find a such policy.
Even while the intuition behind RL is that of an agent interacting with an environment, it does not mean that the mathematical model of such agent captures the notion of his actions as interventions in the environment; this only remains from a linguistic confusion between a real life situation, and a mathematical model. RL and the mathematical tools used in its formulation operate only at the associative level of information; this is, RL can only learn from correlations in data. As Pearl puts it, RL only operates in the first level of causal reasoning and lacks the necessary tools of the upper levels: interventions and counterfactuals.
3 Causal and associative algebras
As a simpler case, consider the structures where is the usual sum, and where is the usual multiplication. It is clear that and do not have the same algebraic structure; more specificaly, is an abelian group while is only a semigroup (Hungerford (1974)); therefore, any equation stated in can not be solved using methods valid for . This is, consider the equation:
which must be solved for if and are known. Given that and are clearly not isomorphic, we can not attempt to solve for using any insight provided by knowledge of ; even if, on an upper level, we knew that on has the form, say, , we must solve for only in the domain of addition; other examples are the group, which is not isomorphic to Klein’s group , or the Hamilton quaternions , which are an abelian group under addition but not under its respective multiplication.
Considering the manipulationist notion of causation (Woodward (2003)), which contains both Pearl’s Structural Causal Models (Pearl (2009)) and Spirtes’ Causation (Spirtes et al. (2000)), we recognize two fundamental aspects: an implicit order and the presence of a context . On the other hand, if considering only associative information, there is no distinguishable order even in the case of stochastically dependant variables; notice that any distribution can be expressed either as or .
Let and the operators representing the associative algebra and the causal algebra; i.e., two variables , and which are correlated are represented as while a variable, or event which causes some other are represented by . Even more specificaly, should be written as
4 Reinforcement Learning
a relation between and , where are parameters which express univocally. In particular, in RL we must find
where , and a function which depends on the state and reward of the system through only associative operations (e.g., the function). And here in this point we have our main argument: since is the associative algebra, and such algebra can not be isomortphic to the causal algebra because of the lack of order, then we can not use causal tools to solve for , and therefore RL is not a causal problem. This is,
are different problems, which must be solved with their respective tools. This said, current reinforcement learning problems can not be considered to be causal if their mathematical formulation relies only on associative tools.
We have argued that problems that can be solved at the associative level of information must be solved using the respective tools, and the same applies for causal problems. One must be careful not to mix the language and framework induced by the chosen formulation in order to model some real-life situation. The classical RL formulation could, in principle, be modified in order to allow a proper causal formulation; we speculate that the Bellman equations for the and functions could be modified, for a deterministic policy and reward, in the following way:
is the probability distribution induced by a Causal Graphical Model
- Clark (2015) Clark, A. (2015). Surfing uncertainty: Prediction, action, and the embodied mind. Oxford University Press.
- Gershman (2015) Gershman, S. J. (2015). Reinforcement learning and causal models. In The Oxford Handbook of Causal Reasoning.
- Hungerford (1974) Hungerford, T. W. (1974). Algebra. Springer-Verlag New York.
- Pearl (2009) Pearl, J. (2009). Causality: Models, Reasoning and Inference. Cambridge University Press, New York, NY, USA, 2nd edition.
- Spirtes et al. (2000) Spirtes, P., Glymour, C. N., and Scheines, R. (2000). Causation, prediction and search. MIT Press.
- Sutton and Barto (1998) Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An introduction. MIT Press.
- Sutton et al. (1992) Sutton, R. S., Barto, A. G., and Williams, R. J. (1992). Reinforcement learning is direct adaptive optimal control. IEEE Control Systems Magazine, 12(2):19–22.
- Szepesvari (2018) Szepesvari, C. (2018). Causality from the perspective of reinforcement learning. Machine Learning for Causal Inference, Counterfactual Prediction, and Autonomous Action (CausalML) Workshop, ICML.
- Woodward (2003) Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford Studies in Philosophy of Science. Oxford University Press.