Capacity of Continuous Channels with Memory via Directed Information Neural Estimator

Calculating the capacity (with or without feedback) of channels with memory and continuous alphabets is a challenging task. It requires optimizing the directed information rate over all channel input distributions. The objective is a multi-letter expression, whose analytic solution is only known for a few specific cases. When no analytic solution is present or the channel model is unknown, there is no unified framework for calculating or even approximating capacity. This work proposes a novel capacity estimation algorithm that treats the channel as a `black-box', both when feedback is or is not present. The algorithm has two main ingredients: (i) a neural distribution transformer (NDT) model that shapes a noise variable into the channel input distribution, which we are able to sample, and (ii) the directed information neural estimator (DINE) that estimates the communication rate of the current NDT model. These models are trained by an alternating maximization procedure to both estimate the channel capacity and obtain an NDT for the optimal input distribution. The method is demonstrated on the moving average additive Gaussian noise channel, where it is shown that both the capacity and feedback capacity are estimated without knowledge of the channel transition kernel. The proposed estimation framework opens the door to a myriad of capacity approximation results for continuous alphabet channels that were inaccessible until now.



There are no comments yet.


page 1

page 2

page 3

page 4


Secrecy Capacity of Colored Gaussian Noise Channels with Feedback

In this paper, the k-th order autoregressive moving average (ARMA(k)) Ga...

Feedback Capacity Formulas of AGN Channels Driven by Nonstationary Autoregressive Moving Average Noise

In this paper we derive closed-form formulas of feedback capacity and no...

An Explicit Formula for the Zero-Error Feedback Capacity of a Class of Finite-State Additive Noise Channels

It is known that for a discrete channel with correlated additive noise, ...

Computing the Feedback Capacity of Finite State Channels using Reinforcement Learning

In this paper, we propose a novel method to compute the feedback capacit...

New Formulas of Feedback Capacity for AGN Channels with Memory: A Time-Domain Sufficient Statistic Approach

In the recent paper [1] it is shown, via an application example, that th...

Reinforcement Learning Evaluation and Solution for the Feedback Capacity of the Ising Channel with Large Alphabet

We propose a new method to compute the feedback capacity of unifilar fin...

On the Capacity of the Peak Power Constrained Vector Gaussian Channel: An Estimation Theoretic Perspective

This paper studies the capacity of an n-dimensional vector Gaussian nois...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Many discrete-time continuous-alphabet communication channels involve correlated noise or inter-symbol interference (ISI). Two predominant communication scenarios over such channels are when feedback from the receiver back to the transmitter is or is not present. The fundamental rates of reliable communication over such channels are, respectively, the feedback (FB) and feedforward (FF) capacity. Starting from the latter, the FF capacity of an -fold point-to-point channel , denoted , is given by [1]


In the presence of feedback, the FB capacity is [17]




is the directed information (DI) from the input sequence to the output [8], and is the distribution of causally-conditioned on (see [21, 24] for further details). Built on (3), for stationary processes, the DI rate is defined as


As proved in [8], when feedback is not present, the optimization problem (2) performed over the marginals is equivalent to the optimization in (1). This casts DI as a unifying information measure for representing both FF and FB capacities.

Computing and requires solving a multi-letter optimization problem. Closed form solutions to this challenging task are known only in several special cases. A common example for is the Gaussian channel with memory [14] and the ISI Gaussian channel [15]. There are no known extensions of these solutions to the non-Gaussian case. For , a solution for the 1st order moving average additive Gaussian noise (MA(1)-AGN) channel was found [12]. Another closed form characterization is available for auto-regressive moving-average (ARMA) AGN channels [11]. To the best of our knowledge, these are the only two non-trivial examples of continuous channels with memory whose FB capacity is known in closed form. Furthermore, when the channel model is unknown, there is no efficient method for numerically approximating capacity.

Some recent progress related to capcity computation was made based on deep learning (DL) techniques

[9, 19]. In a novel work [9], mutual information neural estimator (MINE) [2] was used to learn a modulation for a memoryless channel. In [19]

, a capacity estimator was proposed based on reinforcement learning algorithm that iteratively estimates and maximizes the DI rate, but only for discrete alphabet channels with a known channel model.

Inspired by the above, we develop the framework for estimating FF and FB capacity of arbitrary continuous-alphabet channels, possible with memory, without knowing the channel model. Our method does not need to know the channel transition kernel. We only assume a stationary channel model and that channel outputs can be sampled by feeding it with inputs. Central to our method are a new DI neural estimator (DINE), used to evaluate the communication rate, and a neural distribution transformer (NDT), used to simulate input distributions. Together, the DINE and NDT lay the groundwork for our capacity estimation algorithm. In the remainder of this section, we describe DINE, NDT, and their integration into the capacity estimator.

I-a Directed Information Neural Estimation

The estimation of mutual information (MI) from samples using neural networks (NNs) is a recently proposed approach

[2, 3]

. It is especially effective when the involved random variables (RVs) are continuous. The concept originated from

[2], where MINE was proposed. The core idea is to represent MI using the Donsker-Varadhan (DV) variational formula


where and . The supremum is over all measurable functions for which both expectations are finite. Parameterizing by an NN and replacing expectations with empirical averages, enables gradient ascent optimization to estimate . A variant of MINE that goes through estimating the underlying entropy terms was proposed in [3]. The new estimators were shown empirically to perform extremely well, especially for continuous alphabets.

Herein, we propose a new estimator for the DI rate . The DI is factorized as


where is the differential entropy of and . Applying the approach of [3] to the entropy terms, we expand each as a Kullback-Leibler (KL) divergence and a cross-entropy (CE) residual and invoke the DV representation. To account for memory, we derive a formula valid for causally dependent data, which involves RNNs as function approximator (rather than the FF network used in the independently and identically distributed (i.i.d.) case). Thus, the DINE is an RNN-based estimator for the directed information rate from to based on their samples.

DI estimators were recently presented in [25, 26, 27]. Also, an estimator of the transfer entropy using FF networks was proposed [16]

, which upper bounds the DI in the special case of a jointly Markov process with finite memory. DINE is the first method based on RNN and hence does not assume any parametric model such as discrete alphabets, or Markovity. Further details on the DINE algorithm are given in subsection


I-B Neural Distribution Transformer and Capacity Estimation

DINE accounts for one of the two tasks involved in estimating capacity, it estimates the objective of (2). The remaining task is to optimize this objective over input distributions. Generally, sampling from an arbitrary distribution is a complex task. To overcome this, we design a deep generative model of the channel input distributions, namely the NDT. The idea is similar to ones used for generative modeling tasks, e.g, generative adversarial networks [23]

or variational autoencoders

[22]. The designed NDT maps i.i.d. noise into samples of the channel input distribution. For estimating FB capacity, in addition to the i.i.d. noise, the NDT also receives channel FB as inputs. Together, NDT and DINE form the overall system that estimates the capacity as shown in Fig 1.

The capacity estimation algorithm trains the DINE and NDT models together via an alternating maximization procedure. Namely, we iteratively train each model while keeping the (parameters of the) other one fixed. DINE estimates the communication rate of a fixed NDT input distribution, and the NDT is trained to increase its rate with respect to fixed DINE model. Proceeding until convergence, this results in the capacity estimate, as well as an NDT generative model for the achieving input distribution.

We demonstrate our method on the MA(1)-AGN channel. Both and are estimated using the same algorithm, using the channel as a black-box to solely generate samples. The estimation results are compared with the analytic solution to show the effectiveness of the proposed approach.

A[][][0.65] B[][][0.8]NDT C[][][0.65]Channel D[][][0.8]DINE E[][][0.8] G[][][0.7] H[][][0.8] I[][][0.8](RNN) J[][][0.8]Feedback K[][][0.8](RNN) L[][][0.75] Z[][][0.75]Noise M[][][0.75] N[][][0.75]Output O[][][0.65] P[][][0.68]Gradient Q[][][0.75] R[][][0.75](RNN)

Fig. 1: The overall capacity estimation system. NDT generates samples that are fed into the channel. DINE uses these samples to improve its estimation of the communication rate. DINE then supplies gradient for the optimization of NDT.

Ii Methodology

We give a high-level description of the algorithm and its building blocks. Due to space limitations, full details are reserved to the extended version of this paper. The implementation is available in github.222

Ii-a Directed Information Estimation Method

We propose a new estimator of the DI rate between two correlated stationary processes, termed DINE. Building on [3], we factorize each term in (6) as:


where and are, respectively, the cross entropy (CE) and KL divergence between and , and is uniform reference measure over the support of the dataset. To simplify notation, we use the shorthands


Subtracting both elements in (II-A) and observing that the difference of CE terms equals the DI at the former time step, we have


Note that the difference of KL divergences equals . For stationary data processes we take the limit and obtain


Each is expanded by its DV representation [4] as:


To maximize (11), each DV potential is parametrized by a modified LSTM and expected values are estimated by empirical averages over the dataset . Thus, the optimization objectives are:


where and , are the parametrized potentials.

The estimator is given by:


By universal approximation of RNNs [6] and Breiman’s theorem [7], the maximizer of (13) approaches as the number of samples grows, provided the neural networks are sufficiently expressive.

input: Samples of the process .
output: , estimated directed information rate.


Initialize networks parameters .
Step 1, Optimization:
     Draw a batch
     Feed the network with the examples and compute
  loss , .
     Update networks parameters:
until convergence
Step 2, Perfrom a Monte Carlo estimation over and subtract loss evaluations to obtain estimation :           
Algorithm 1 Directed Information Rate Estimation

To capture the time dependencies in we introduce a modified LSTM network model for functional approximation. LSTM [5] is an RNN that receives a time series as input and for each

, performs a recursive non-linear transform to calculate its hidden state

. We denote the LSTM function by . The full characterization of is provided in [5].

We modify the structure of the LSTM to perform the calculations:


A similar modification is introduced for by substitution of with and with , we have:


A visualization of a modified LSTM cell (unrolled) is shown in Fig. 2. The LSTM cell’s output is the sequence , which is fed into a fully-connected layer to obtain and . As demonstrated by Algorithm 1 and Fig. 3, in each iteration we draw , a subset on , of size B. We feed the NN with to acquire ,

. Those enter the NN loss function (

II-A), and gradients are calculated to update the NN parameters .


Fig. 2: The modified LSTM cell unrolled in the DINE architecture of . Recursively, at each time , and are mapped to and , respectively.

A[][][0.7] B[][][0.45]Reference Gen. C[][][0.7] D[][][0.7]Modified E[][][0.65] F[][][0.65] G[][][0.7]Dense H[][][0.8]Dense I[][][0.6] J[][][0.6] K[][][0.85]DV L[][][0.7]LSTM M[][][0.7]Layer N[][][0.7]Input Z[][][0.7]

Fig. 3: End-to-end architecture for the estimation of . Each batch of time sequences enters the system, a batch of the same size is sampled from the reference measure and those enter the NN to compute and .

Ii-B Neural Distribution Transformer

The DINE model is an effective approach to estimate the argument of (2). However, finding the capacity comprises maximization of the DI with respect to the input distribution. For this purpose we present the NDT model that represents a general input distribution of the channel. At each iteration

the NDT maps an i.i.d noise vector

to a channel input variable . When feedback is present the NDT maps . Thus, NDT is represented by an RNN with parameters as shown in Fig. 4. The NDT model is used to generate the channel input , and the DINE estimates the DI between and .

A[][][0.85]LSTM B[][][0.8]Dense C[][][0.8]Dense D[][][0.67]Power E[][][0.7] F[][][0.7]G[][][0.7] H[][][0.55] I[][][0.67]Constraint

Fig. 4: The NDT. The noise and past channel output (if feedback is applied) are fed into an NN. The last layer performs normalization to obey the power constraint, if needed.

Ii-C Complete Architecture Layout

Combining DINE and NDT models into a complete system enables capacity estimation. As shown in Fig. 1, the NDT model is fed with i.i.d. noise and its output is the samples . These samples are fed into the channel to generate its output. Then, are fed both to the DINE model that outputs . To estimate the capacity, DINE and NDT models are trained together. The training scheme, as shown in Algorithm 2, is a variant of alternated maximization procedure. This procedure iterates between updating the DINE and NDT models parameters sets , where each iteration the parameters of one model are fixed and the other ones are updated. By the end of training a long Monte-Carlo evaluation of samples is done in order to estimate the expectations in (II-A) accurately.

input: Continuous channel, feedback indicator
output: , estimated capacity.


Initialize DINE parameters,
Initialize NDT parameters
if feedback indicator then
     Add feedback to NDT
     Step 1: Train DINE model
     Generate B sequences of length T of i.i.d random noise
     Compute with NDT and channel
     Compute ,
     Update DINE parameters:
     Step 2: Train NDT
     Generate B sequences of length T of i.i.d random noise
     Compute with NDT and channel
     compute the objective:
     Update NDT parameters:
until convergence
Monte Carlo evaluation of
Algorithm 2 Capacity Estimation

Applying this algorithm to channels with memory estimates their capacity without any specific knowledge of the channel underlying distribution. Next, we demonstrate the effectiveness of this algorithm on continuous alphabet channels.

Iii Numerical Results

We demonstrate the performance of Algorithm 2 on the AWGN channel and the first order MA-AGN channel. The numerical results are then compared with the analytic solution to verify the effectiveness of our method.

Iii-a AWGN channel

The power constrained AWGN channel is investigated as an instance of memoryless continuous alphabet channel for which analytic solution is known. The channel model is given by


where are i.i.d RVs, and is the channel input sequence bound to the power constraint . Its capacity is given by . In our implementation we chose and estimated the capacity for a range of values. The numerical results are compared to the analytic solution in Fig. 5

Fig. 5: Estimation and capacity of the AWGN channel for various values of SNR

Iii-B Gaussian MA(1) channel

The calculation of capacity of linear Gaussian channels with memory can be divided into two cases, feedback () and feed-forward () capacity. We will focus on the MA(1) Gaussian channel model, which is given by:


where, , is the channel input sequence bound to the power constraint , and is the channel output.

Iii-B1 Feed-forward capacity

For the LTI Gaussian channel with input power constraint, can be obtained by applying the water-filing algorithm [14]. We applied Algorithm 2 to estimate and compare with results of the water-filling algorithm. Results are in Fig. 6.

Fig. 6: Preformance of estimation in the MA(1)-AGN channel.

Iii-B2 Feedback capacity

In general, of the ARMA(k) Gaussian channel can be formulated as a dynamic programming problem, which can be solved by an iterative algorithm [11]. For the particular case of (17), is given by , where is a solution of a 4th order polynomial equation. We applied Algorithm 2 for the feedback capacity to obtain an estimate of . The results and compared with the analytic solution as shown in Fig. 7).

Fig. 7: Preformance of estimation in the MA(1)-AGN channel.

Fig. 8: Optimization progress of directed information rate of Algorithm 2 for the feedback setting with . The information rates were estimated by a Monte-Carlo evaluation of (13) with samples.

Iv Conclusion and Future Work

We have presented a methodology to estimate FF and FB capacity using the channel as a ”black-box”. The estimator is designed by a novel DI estimator (DINE) and NDT model, both based on RNNs. The performance of the estimator are demonstrated on the AWGN and MA(1)-AGN channels, and estimation agrees with the analytic solution.

We wish to further generalize our method of information rate estimation for multi-user communication channels, a field with many unsolved problems and to find theoretical guarantees of the estimator. In addition, information theory (e.g, channel capacity) give us a rigorous mathematical framework where analytical solution are known due to Shannon theory hence this can be a good problem for evaluating machine learning approaches.