Effect Handling for Composable Program Transformations in Edward2

11/15/2018
by   Dave Moore, et al.
Google
0

Algebraic effects and handlers have emerged in the programming languages community as a convenient, modular abstraction for controlling computational effects. They have found several applications including concurrent programming, meta programming, and more recently, probabilistic programming, as part of Pyro's Poutines library. We investigate the use of effect handlers as a lightweight abstraction for implementing probabilistic programming languages (PPLs). We interpret the existing design of Edward2 as an accidental implementation of an effect-handling mechanism, and extend that design to support nested, composable transformations. We demonstrate that this enables straightforward implementation of sophisticated model transformations and inference algorithms.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

06/20/2019

Deployable probabilistic programming

We propose design guidelines for a probabilistic programming facility su...
12/24/2019

Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro

NumPyro is a lightweight library that provides an alternate NumPy backen...
02/22/2019

Optimizing Space of Parallel Processes

This paper is a contribution to exploring and analyzing space-improvemen...
05/07/2021

NoCFG: A Lightweight Approach for Sound Call Graph Approximation

Interprocedural analysis refers to gathering information about the entir...
06/02/2020

Effectful Programming in Declarative Languages with an Emphasis on Non-Determinism: Applications and Formal Reasoning

This thesis investigates effectful declarative programming with an empha...
07/05/2020

Starfish: A Prototype for Universal Preprocessing and Text-Embedded Programming

We present a novel concept of universal text preprocessing and text-embe...
10/18/2020

Handling Bidirectional Control Flow: Technical Report

Pressed by the difficulty of writing asynchronous, event-driven code, ma...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Algebraic effects and handlers have emerged in the programming languages community as a convenient, modular abstraction for controlling computational effects. They have found several applications including concurrent programming, meta programming, and more recently, probabilistic programming, as part of Pyro’s Poutines library (Uber AI Labs, 2017).

We investigate the use of effect handlers as a lightweight abstraction for implementing probabilistic programming languages (PPLs). We interpret the existing design of Edward2 as an accidental implementation of an effect-handling mechanism, and extend that design to support nested, composable transformations. We demonstrate that this enables straightforward implementation of sophisticated model transformations and inference algorithms.

2. Algebraic Effects and Handlers

An effectful operation is an operation that interacts with some (possibly external) handling code in order to execute. For example, suppose that a process wants to access a file. It sends a request to the OS kernel and suspends execution. The kernel checks the request, executes it, and responds with the result of the operation. The process then resumes execution.

Algebraic effects and their handlers (Plotkin and Power, 2001; Plotkin and Pretnar, 2009) extend this request-response idea to computations within a program. Impure behaviour arises from a set of effectful operations, whose concrete implementation is separately given in the form of effect handlers. The programmer chooses how to handle different operations. Consider an example (Pretnar, 2015):

let abc = (print(’a’); print(’b’); print(’c’))
let reverse = handler {print(s.k)  k(); print(s)}
with reverse handle abc

In this program, abc is a computation that prints out the letters ‘a’, ‘b’ and ‘c’, in this order, using three separate calls to the operation print. The handler reverse reverses the order in which print operations are executed: it first resumes the continuation k to the operation, and only then performs the operation itself. The computation with reverse handle abc is the result of executing abc, while handling operations with reverse: a printout of ‘c’, ‘b’ and ‘a’ in this order.

One very useful feature of effect handlers is that they can be nested to combine the way they interpret the computation. In the presence of a handler join, which joins the effect of print statements into a single print statement, we can easily obtain a reversed single printout ‘cba’, by combining with join handle with reverse handle abc.

3. Effects in Probabilistic Programming

The application of effect handlers to probabilistic programming has been previously discussed (Uber AI Labs, 2017; Scibior and Kammar, 2015) but is perhaps not yet widely appreciated. Consider a Beta-Binomial model:

let model(n) =
    let z = sample(beta(1., 1.), ’z’)
    let x = sample(binomial(z, n), ’x’)
    return x

The insight is to treat sampling statements as operations that can be handled by a separately defined handler.111Although not covered in this paper, handling deterministic operations can also be desirable, e.g. for local reparameterization gradients (Kingma et al., 2015). This enables a range of useful program transformations, including (though not limited to):

Conditioning. The condition handler takes a mapping from variable names to their observed values, and changes the respective sampling statements to observe statements:

let condition(name, value) = handler{
    sample(dist,name; k) 
        k(observe(dist, value, name))}
with condition(’x’, data) handle model(n)

Tracing.

A tracing handler accumulates the values of all random variables defined by the model, so that

with trace handle model(n) obtains a sample for both z and x. Tracing can also be used for program analysis, for example, computing Markov blankets for efficient algorithms.

Density function derivation. Inference algorithms such as Metropolis-Hastings require access to the log joint density . This may be derived using a trace-like handler that accumulates the conditional log densities for each value.

Model reparametrization. Reparametrizing a probabilistic model means expressing it in terms of different parameters, and specifying a way to recover the original parameters. A common example, implemented in existing systems such as Pyro (Uber AI Labs, 2017) and Stan (Carpenter et al., 2017), is unconstraining: transforming constrained variables such as the Beta variable z to unconstrained space. This can substantially ease inference and is highly desirable for a system to perform automatically.

Reparametrization can be expressed using effect handlers. Assuming a handler unconstrain (expanded below), writing with unconstrain handle model(n) gives the unconstrained Beta-Binomial model. An elegant aspect of this approach is that it immediately generalizes to other reparameterizations such as non-centering or inverse CDF transformations.

Variational inference. Effect handling can automatically construct a variational family on-the-fly (Wingate and Weber, 2013), e.g., a mean-field variational inference handler may handle sample(dist) by initialising the parameters mu and sigma, and transforming the random draw to sample(normal(mu, sigma)). Separately, a handler may also be used to align a model’s latent variables with samples from a variational model.

3.1. Composing Effect Handlers

Many handlers become much more useful when composed; for example, when reparameterizing a model, or automatically constructing a variational model, we would typically then want to derive the joint density function of the transformed model. Composing effect handlers makes this straightforward. For example, the unnormalised posterior on z is:

let posterior(z) =
  with log_joint(z) handle
    with unconstrain handle
      with condition(x, data) handle model(n)

More generally, composing effect handlers allows sophisticated program transformations; for example, the unconstraining, variational guide, and log-joint handlers enable an almost trivial implementation of ADVI (Kucukelbir et al., 2017).

4. Effect Handling in Edward2

Edward2 is a lightweight framework for probabilistic programming in TensorFlow

(Tran et al., 2018). The main abstraction is the RandomVariable (RV)

: it wraps a probability distribution with a

valuetensor which reifies a sample from that distribution; constructing an RV implicitly performs a sample

operation. Models are typically written as generative processes defining a joint distribution over the values of all random variables:

from tensorflow_probability import edward2 as ed
def model(n):
  z = ed.Beta(1., 1., name=’z’)
  x = ed.Binomial(z, n, name=’x’)
  return x

Edward2 supports program transformations by interception. Running the model in the context of an interceptor overrides the construction of random variables. To implement this, RV constructors are wrapped with a method that checks for an interceptor on a global context stack and, if present, dispatches control.

def condition_interceptor(**values):
  def interceptor(rv_constructor, **rv_kwargs):
    rv_name = rv_kwargs[’name’]
    rv_kwargs[’value’] = values.get(rv_name)
    return rv_constructor(**rv_kwargs)
  return interceptor
with ed.interception(condition_interceptor(z=0.3)):
  x = model(n)

We observe that interceptors are essentially an accidental implementation of effect handlers; more specifically, a restricted form in which the handler accesses its continuation only implicitly. The handler (interceptor) overrides a sample operation with arbitrary computations, potentially including side effects, and ends by invoking an implicit continuation to return a value to the original callsite. Edward2’s original framework did not compose interceptors, but viewing interceptors as effect handlers suggests that composing them can enable sophisticated program transformations.

The semantics of composing interceptors may be understood in terms of effect forwarding. Interceptors may call the rv_constructor directly, in which case the operation is not visible to any higher-level interceptors, or they may explicitly forward the operation by re-wrapping the constructor as interceptable(rv_constructor). They may also invoke other RV constructors, which by default creates wrapped (forwarded) operations.

As an example application requiring nested interceptors, we implement an unconstraining interceptor, using the Bijectors library to handle Jacobian corrections (Dillon et al., 2017):

from tensorflow_probability import bijectors as tfb
def unconstrain(rv_constructor, **rv_kwargs):
  base_rv = rv_constructor(**rv_kwargs)
  bijector = constraining_transform(base_rv)
  unconstrained_rv = ed.TransformedDistribution(
    distribution=base_rv.distribution,
    bijector=tfb.Invert(bijector))
  return bijector.forward(unconstrained_rv)

Here the ed.TransformedDistribution constructor invokes an interceptable operation, while the original base_rv constructor is not forwarded, so that the transformed program appears to a higher-level handlers (e.g., a log joint density) as containing variables in unconstrained space.

5. Discussion

There is often a gap between theoretical discussions of probabilistic programming and the implementation of practical systems. We believe the emergence of effect handlers as a convergent design pattern in deep PPLs is notable, and hope that highlighting it may lead to interesting connections in both directions between theory and practice. Compared to Pyro’s Poutines (Uber AI Labs, 2017), which also implement effect handling, Edward2’s interception provides substantially similar functionality with a different, somewhat lighter-weight interface. We believe both are interesting points in design space and look forward to exploring their tradeoffs.

We do not claim that effect handling is a complete mechanism for probabilistic programming; for example, it is not obvious how non-local rewrites such as general symbolic algebra on computation graphs (Hoffman et al., 2018) might fit in an effect-handling framework. Understanding the space of program transformations that can be usefully specified as effect handlers is an exciting area of future work.

References