# Amortized learning of neural causal representations

Causal models can compactly and efficiently encode the data-generating process under all interventions and hence may generalize better under changes in distribution. These models are often represented as Bayesian networks and learning them scales poorly with the number of variables. Moreover, these approaches cannot leverage previously learned knowledge to help with learning new causal models. In order to tackle these challenges, we represent a novel algorithm called causal relational networks (CRN) for learning causal models using neural networks. The CRN represent causal models using continuous representations and hence could scale much better with the number of variables. These models also take in previously learned information to facilitate learning of new causal models. Finally, we propose a decoding-based metric to evaluate causal models with continuous representations. We test our method on synthetic data achieving high accuracy and quick adaptation to previously unseen causal models.

READ FULL TEXT
Comments

Robert R Tucci ∙

Your definition of Bayesian networks is too limited. Bayesian Networks can have continuous nodes and also deterministic nodes. Such nodes have been used by B net practitioners since the beginning of B nets. In the continuous node case, one assigns a transition matrix to the node which is a probability density instead of a discrete probability distribution. Andrew Gelman (Columbia Univ.) has been using continuous nodes in his B nets his entire career. As for deterministic nodes, if the node outputs y and the input is x, then the transition probability matrix for the node is \delta(x, f(y)), where \delta is either the Kronecker or the Dirac delta function, and f(\cdot) is a function of x. A delta function is a perfectly legal probability distribution.

So the distinctions you are making are fallacious. Neural nets are Bayesian networks too! They are very narrow class of B nets in which all of the nodes are deterministic.