
Neural Likelihoods via Cumulative Distribution Functions
We leverage neural networks as universal approximators of monotonic functions to build a parameterization of conditional cumulative distribution functions. By a modification of backpropagation as applied both to parameters and outputs, we show that we are able to build black box density estimators which are competitive against recently proposed models, while avoiding assumptions concerning the base distribution in a mixture model. That is, it makes no use of parametric models as building blocks. This approach removes some undesirable degrees of freedom on the design on neural networks for flexible conditional density estimation, while implementation can be easily accomplished by standard algorithms readily available in popular neural network toolboxes.
11/02/2018 ∙ by Pawel Chilinski, et al. ∙ 6 ∙ shareread it

The Sensitivity of Counterfactual Fairness to Unmeasured Confounding
Causal approaches to fairness have seen substantial recent interest, both from the machine learning community and from wider parties interested in ethical prediction algorithms. In no small part, this has been due to the fact that causal models allow one to simultaneously leverage data and expert knowledge to remove discriminatory effects from predictions. However, one of the primary assumptions in causal modeling is that you know the causal graph. This introduces a new opportunity for bias, caused by misspecifying the causal model. One common way for misspecification to occur is via unmeasured confounding: the true causal effect between variables is partially described by unobserved quantities. In this work we design tools to assess the sensitivity of fairness measures to this confounding for the popular class of nonlinear additive noise models (ANMs). Specifically, we give a procedure for computing the maximum difference between two counterfactually fair predictors, where one has become biased due to confounding. For the case of bivariate confounding our technique can be swiftly computed via a sequence of closedform updates. For multivariate confounding we give an algorithm that can be efficiently solved via automatic differentiation. We demonstrate our new sensitivity analysis tools in realworld fairness scenarios to assess the bias arising from confounding.
07/01/2019 ∙ by Niki Kilbertus, et al. ∙ 5 ∙ shareread it

Towards Inverse Reinforcement Learning for Limit Order Book Dynamics
Multiagent learning is a promising method to simulate aggregate competitive behaviour in finance. Learning expert agents' reward functions through their external demonstrations is hence particularly relevant for subsequent design of realistic agentbased simulations. Inverse Reinforcement Learning (IRL) aims at acquiring such reward functions through inference, allowing to generalize the resulting policy to states not observed in the past. This paper investigates whether IRL can infer such rewards from agents within real financial stochastic environments: limit order books (LOB). We introduce a simple onelevel LOB, where the interactions of a number of stochastic agents and an expert trading agent are modelled as a Markov decision process. We consider two cases for the expert's reward: either a simple linear function of state features; or a complex, more realistic nonlinear function. Given the expert agent's demonstrations, we attempt to discover their strategy by modelling their latent reward function using linear and Gaussian process (GP) regressors from previous literature, and our own approach through Bayesian neural networks (BNN). While the three methods can learn the linear case, only the GPbased and our proposed BNN methods are able to discover the nonlinear reward case. Our BNN IRL algorithm outperforms the other two approaches as the number of samples increases. These results illustrate that complex behaviours, induced by nonlinear reward functions amid agentbased stochastic scenarios, can be deduced through inference, encouraging the use of inverse reinforcement learning for opponentmodelling in multiagent systems.
06/11/2019 ∙ by Jacobo RoaVicens, et al. ∙ 1 ∙ shareread it

A Dynamic Edge Exchangeable Model for Sparse Temporal Networks
We propose a dynamic edge exchangeable network model that can capture sparse connections observed in real temporal networks, in contrast to existing models which are dense. The model achieved superior link prediction accuracy on multiple data sets when compared to a dynamic variant of the blockmodel, and is able to extract interpretable timevarying community structures from the data. In addition to sparsity, the model accounts for the effect of social influence on vertices' future behaviours. Compared to the dynamic blockmodels, our model has a smaller latent space. The compact latent space requires a smaller number of parameters to be estimated in variational inference and results in a computationally friendly inference algorithm.
10/11/2017 ∙ by Yin Cheng Ng, et al. ∙ 0 ∙ shareread it

Counterfactual Fairness
Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender, or sexual orientation. Since this past data may be biased, machine learning predictors must account for this to avoid perpetuating or creating discriminatory practices. In this paper, we develop a framework for modeling fairness using tools from causal inference. Our definition of counterfactual fairness captures the intuition that a decision is fair towards an individual if it the same in (a) the actual world and (b) a counterfactual world where the individual belonged to a different demographic group. We demonstrate our framework on a realworld problem of fair prediction of success in law school.
03/20/2017 ∙ by Matt J. Kusner, et al. ∙ 0 ∙ shareread it

Scaling Factorial Hidden Markov Models: Stochastic Variational Inference without Messages
Factorial Hidden Markov Models (FHMMs) are powerful models for sequential data but they do not scale well with long sequences. We propose a scalable inference and learning algorithm for FHMMs that draws on ideas from the stochastic variational inference, neural network and copula literatures. Unlike existing approaches, the proposed algorithm requires no message passing procedure among latent variables and can be distributed to a network of computers to speed up learning. Our experiments corroborate that the proposed algorithm does not introduce further approximation bias compared to the proven structured meanfield algorithm, and achieves better performance with long sequences and large FHMMs.
08/12/2016 ∙ by Yin Cheng Ng, et al. ∙ 0 ∙ shareread it

ObservationalInterventional Priors for DoseResponse Learning
Controlled interventions provide the most direct source of information for learning causal effects. In particular, a doseresponse curve can be learned by varying the treatment level and observing the corresponding outcomes. However, interventions can be expensive and timeconsuming. Observational data, where the treatment is not controlled by a known mechanism, is sometimes available. Under some strong assumptions, observational data allows for the estimation of doseresponse curves. Estimating such curves nonparametrically is hard: sample sizes for controlled interventions may be small, while in the observational case a large number of measured confounders may need to be marginalized. In this paper, we introduce a hierarchical Gaussian process prior that constructs a distribution over the doseresponse curve by learning from observational data, and reshapes the distribution with a nonparametric affine transform learned from controlled interventions. This function composition from different sources is shown to speedup learning, which we demonstrate with a thorough sensitivity analysis and an application to modeling the effect of therapy on cognitive skills of premature infants.
05/05/2016 ∙ by Ricardo Silva, et al. ∙ 0 ∙ shareread it

Bayesian Inference in Cumulative Distribution Fields
One approach for constructing copula functions is by multiplication. Given that products of cumulative distribution functions (CDFs) are also CDFs, an adjustment to this multiplication will result in a copula model, as discussed by Liebscher (J Mult Analysis, 2008). Parameterizing models via products of CDFs has some advantages, both from the copula perspective (e.g., it is welldefined for any dimensionality) and from general multivariate analysis (e.g., it provides models where small dimensional marginal distributions can be easily readoff from the parameters). Independently, Huang and Frey (J Mach Learn Res, 2011) showed the connection between certain sparse graphical models and products of CDFs, as well as messagepassing (dynamic programming) schemes for computing the likelihood function of such models. Such schemes allows models to be estimated with likelihoodbased methods. We discuss and demonstrate MCMC approaches for estimating such models in a Bayesian context, their application in copula modeling, and how messagepassing can be strongly simplified. Importantly, our view of messagepassing opens up possibilities to scaling up such methods, given that even dynamic programming is not a scalable solution for calculating likelihood functions in many models.
11/09/2015 ∙ by Ricardo Silva, et al. ∙ 0 ∙ shareread it

Learning Instrumental Variables with NonGaussianity Assumptions: Theoretical Limitations and Practical Algorithms
Learning a causal effect from observational data is not straightforward, as this is not possible without further assumptions. If hidden common causes between treatment X and outcome Y cannot be blocked by other measurements, one possibility is to use an instrumental variable. In principle, it is possible under some assumptions to discover whether a variable is structurally instrumental to a target causal effect X → Y, but current frameworks are somewhat lacking on how general these assumptions can be. A instrumental variable discovery problem is challenging, as no variable can be tested as an instrument in isolation but only in groups, but different variables might require different conditions to be considered an instrument. Moreover, identification constraints might be hard to detect statistically. In this paper, we give a theoretical characterization of instrumental variable discovery, highlighting identifiability problems and solutions, the need for nonGaussianity assumptions, and how they fit within existing methods.
11/09/2015 ∙ by Ricardo Silva, et al. ∙ 0 ∙ shareread it

Gaussian Process Structural Equation Models with Latent Variables
In a variety of disciplines such as social sciences, psychology, medicine and economics, the recorded data are considered to be noisy measurements of latent variables connected by some causal structure. This corresponds to a family of graphical models known as the structural equation model with latent variables. While linear nonGaussian variants have been wellstudied, inference in nonparametric structural equation models is still underdeveloped. We introduce a sparse Gaussian process parameterization that defines a nonlinear structure connecting latent variables, unlike common formulations of Gaussian process latent variable models. The sparse parameterization is given a full Bayesian treatment without compromising Markov chain Monte Carlo efficiency. We compare the stability of the sampling procedure and the predictive ability of the model against the current practice.
08/09/2014 ∙ by Ricardo Silva, et al. ∙ 0 ∙ shareread it

Flexible sampling of discrete data correlations without the marginal distributions
Learning the joint dependence of discrete variables is a fundamental problem in machine learning, with many applications including prediction, clustering and dimensionality reduction. More recently, the framework of copula modeling has gained popularity due to its modular parametrization of joint distributions. Among other properties, copulas provide a recipe for combining flexible models for univariate marginal distributions with parametric families suitable for potentially high dimensional dependence structures. More radically, the extended rank likelihood approach of Hoff (2007) bypasses learning marginal models completely when such information is ancillary to the learning task at hand as in, e.g., standard dimensionality reduction problems or copula parameter estimation. The main idea is to represent data by their observable rank statistics, ignoring any other information from the marginals. Inference is typically done in a Bayesian framework with Gaussian copulas, and it is complicated by the fact this implies sampling within a space where the number of constraints increases quadratically with the number of data points. The result is slow mixing when using offtheshelf Gibbs sampling. We present an efficient algorithm based on recent advances on constrained Hamiltonian Markov chain Monte Carlo that is simple to implement and does not require paying for a quadratic cost in sample size.
06/12/2013 ∙ by Alfredo Kalaitzis, et al. ∙ 0 ∙ shareread it