In recent years, there has been an increased effort towards data-driven code constructions, e.g., [19, 18, 9, 2, 8]. As opposed to the traditional methods, data-driven approaches make no assumptions on the source statistics, which leads to a significant improvement when looking at complex sources of data such as images. Instead, such methods aim at discovering efficient codes by making use of a (potentially large) pool of data, in conjunction with a learning algorithm. A successful candidate for the latter is Neural Network (NN), which has tremendous potential in many application domains. Many of the previous works have focused on the point-to-point communication problem, under various channels. For example, learning the physical layer representation is studied for single-input and single-output (SISO) system in , multiple input and multiple output (MIMO) system in  and orthogonal frequency-division multiplexing (OFDM) system in . NN-based joint source channel coding (JSCC) is proposed for images in , and for text in . Though traditional techniques are optimal in the asymptotic regimes, in practical scenarios, it was shown in [19, 18, 9, 2, 8]
that NN-based methods were competitive, and even out-performed state of the art methods in some signal-to-noise ratio (SNR) regimes.
In this work, we shift the scope to the network perspective, and make a first step in broadening our understanding of the benefits and challenges of data-driven approaches for a networked communication scheme. More precisely, we look at the problem of multi-casting correlated sources over a network of point-to-point noisy channels. The problem of multi-casting correlated sources has a long history starting from the seminal work of Slepian and Wolf (S-W) . In the S-W problem, a direct edge exists from each source to the destination. Csiszár showed in  that linear codes are sufficient to achieve the S-W bound. For the scenario where correlated sources have to be communicated over a point-to-point network, the achievable rates were derived in . It has been shown in  that random linear network coding can be used for multi-casting correlated sources over a network, with error exponent generalizing the linear S-W coding . However, for arbitrarily correlated sources, either maximum a posterior (MAP) decoders or minimum entropy (ME) decoders are required for random linear network coding. Despite the efforts toward reducing the complexity of ME or MAP decoders, it is in general NP-hard to find a ME or MAP solution. The decoding complexity of the proposed joint coding scheme of  has motivated Ramamoorthy et al.  to seek a separation between source and network coding. Such separation would allow the use of efficient source codes at the sources, followed by efficient network codes, thus allowing an overall efficient code construction. Unfortunately, such separation turned out to be impossible in general. Following this, joint source and network coding schemes with low-complexity decoders have been proposed, but those suffer from restricted settings. For example, in  sources are restricted to be binary and in  the number of sources is restricted to be two. To the best of our knowledge, there is no existing practical joint source and network coding scheme for multi-casting arbitrarily correlated real-life signals over a point-to-point network of arbitrary topology.
To this end, we propose an application-specific novel code construction based on NN, which we refer to as Neural Network Coding, or NNC for short. The NNC scheme has the following main benefits: (a) it makes no assumptions on the statistics of the source, but rather makes use of a seed dataset of examples; (b) it is an end-to-end communication scheme, or equivalently, a joint source and network coding scheme; (c) it has practical decoders; (d) it can be applied to any network topology; and (e) it can also be used with various power constraints at each of the source and intermediate nodes. Figure 1(a) demonstrates a NNC applicable scenario, where arbitrarily correlated signals are multi-cast over a communication network. In the network, there are four source nodes and two destination nodes. In fact, NNC can find wide applications from Internet of Things (IoT) to autonomous driving, where correlated signals generated by distributed measurements need to be combined for processing at central processors. In IoT and autonomous driving, signals are task-specific, leading to an efficient application-specific scheme. Also, latency, bandwidth and energy can be extremely constrained in both cases, precluding computationally demanding long-block length source and channel coding techniques, let alone joint multi-casting scheme with ME or MAP decoders .
In NNC, the encoders at the source nodes and intermediate nodes, as well as the decoders at the destination nodes, are NNs as shown in Figure 1(b). The resulting network code construction is jointly designed with the encoding phase and decoding phase of the transmission, where real-valued input signals are mapped into channel inputs, and channel outputs are reconstructed into real-valued signals. The end-to-end NNC scheme can be optimized through training and testing offline over a large data set, and can be readily implemented. Of particular interest for these codes, is the power-distortion trade-off they achieve. In other words, for a given topology and power constraints on the nodes, what is the expected distortion that the code achieves, where the distortion measure is specified. NNC is reminiscent of the auto-encoder structure. An auto-encoder is a NN trained to minimize the distortion, e.g. Mean Square Error (MSE), between its output and input. The end-to-end NN structure that results from NNC scheme is similar to the auto-encoder mentioned above, with some additional constraints imposed by the topological structure of the physical communication network. Our experimental results showed the benefit of having non-linear code construction in this setup. Furthermore, we illustrate through experiments on images that NNC achieves better performance compared to a separated scheme based on a compression scheme (JPEG ), followed by capacity achieving network coding. While still in its infancy, we believe that NNC and its variants may pave the way to an efficient way to exploit non-linearity in codes, which appears to be an important component to more complex networked settings.
Ii System model
Throughout the paper, we use. Elements of are called nodes and elements of are called links. Each of the links is assigned an energy constraint , which specifies the maximum signal energy that can be sent from node to node . We consider two disjoint subsets , of , where each element in is called a source node and each element in is called a destination node. Let and denote and respectively. We consider virtual sources located at source nodes. Each generates a random variable , , according to the joint density function . are arbitrarily correlated. The resulting random vector is denoted by . Observe that may not be equal to . This setup encompasses the case in which some of the sources are co-located, or some physical sources generate random variables of higher dimension, by grouping some of the sources into a source node in . Thus, when appropriate we may refer to a source node to represent the collection of virtual sources which are co-located at (c.f. Experiments section IV).
We model each link in the network as a set of parallel channels. More precisely, two nodes and in the network are connected via parallel noisy channels. On each link in the network, may transmit any signal in , subject to the average power constraint on the link, i.e. . The signal on each link then gets corrupted by noise. Node receives a corrupted signal . In the special case of independent zero-mean Additive White Gaussian Noise (AWGN) channels, the node receives , where each element of the -dimensional vector
is a zero-mean Gaussian random variable with variance. Note that this setup models wireless point-to-point communication where the independent channels are obtained by constructing orthogonal sub-channels from the available bandwidth .
We study the multi-cast problem where information generated at the source nodes must be communicated to the destination nodes. At each destination node
, an estimateof the source is reconstructed. Performance of the multi-casting scheme can be evaluated by a tuple of distortion measure s, with each one of which defined between the source and the estimation at a destination .
Iii Neural Network Coding
In NNC, we design the channel inputs at the source nodes, and at the intermediate nodes jointly – this makes NNC a joint source and network coding scheme. Existing joint source and network coding schemes, e.g., [11, 27, 15, 13], assume error-free point-to-point transmission on each link, and focus on the network layer operations. The physical layer then relies on a separate channel coding scheme with potentially high latency, as it is assumed that each link is employing an error correcting code with a large block length. In contrast, in NNC the signal inputs are directly transmitted over the noisy channels, i.e. there are no underlying physical layer codes. As such, the communication problem described in Section II can be decomposed into three phases as shown in Figure 1(a): the encoding phase, the transmission phase and the decoding phase. NNC operates in a one-shot manner over all three phases. In the encoding phase, real-valued signals at the source nodes are directly mapped into network codes. The length of a network code is designed to match the number of independent channels consisted in link . can therefore be transmitted concurrently through the noisy channels over link . In the transmission phase, network codes s are directly constructed at node from the incoming noise-corrupted network codes , where is the set of direct predecessors of . In the decoding phase, each destination node reconstructs the transmitted signals directly from the noise-corrupted network codes it receives. NNC does not involve any long block-length source or channel coding techniques, and therefore is free of their associated latency penalty and computational cost.
Note that by picking a non-linear activation, the resulting joint source and network code is non-linear by design. As mentioned in Section I, the non-linearity in codes may be crucial in constructing efficient codes for the problem at hand. We design the network code from node to node by constructing a NN with input dimension and output dimension . When , . When , is the dimension of signal generated at . During a transmission, the concatenation of noise-distorted network codes received at , , is fed into the NN if . Or the generated signal is fed into the NN if . The NN output is the network code to be transmitted over link .
Similarly, we reconstruct the input signal as by decoding the received noise-distorted network codes with a NN at each destination node . Note that NNs at destination nodes are low-complexity decoders, since each layer of a NN is an affine transformation followed by an element-wise non-linear function. We say that the set of functions for constructing and decoding network codes at each node specifies a NNC policy for the communication system, if each of them can be represented as a NN. Under a NNC policy, the end-to-end resulting encoding-decoding can be seen as a series of NNs connected via noisy links, as given by the physical network topology. It will be convenient to represent those the noisy links by NN-layers as well, with the main difference that those layers have fixed (non-trainable) parameters which will correspond to the channel statistics. Thus, under a NNC policy, we construct an end-to-end NN, where some of the layers have non-trainable parameters. The end-to-end NN has physical topology of the communication system embedded, and has NNs which are used for constructing and decoding network code as its sub-graphs. We refer to the NNs for constructing and decoding network code as inner NNs henceforth. Overall, there are input layers and output layers in the end-to-end NN. Each input layer has width equal to the dimension of source generated at the node. All output layers have width . An illustration of an end-to-end NN is given in Figure 2.
With partitioned and fed into the input layers, the outputs of the end-to-end NN simulate , the reconstruction at destination nodes under current NNC policy. Recall that is the distortion measure between the source and the estimation at a destination node , as defined in Section II. Parameters of the NNC policy are initialized randomly and are trained to minimize
over a large data set sampled from . Note the optimization problem is the Lagrangian relaxation of the problem discussed in Section II.
We control the transmission power implicitly by penalizing power in the objective function through : The larger is, the more the power on link is penalized.
The parameters of the NN policy can be trained and tested offline111 For the best performance, efforts are in general required to optimize over the choice of hyper-parameters as well, as is the case in other applications of NNs. Hyper-parameters, such as the number of layers and the activation functions in every inner NN, can also be tuned and tested offline before implementation.
For the best performance, efforts are in general required to optimize over the choice of hyper-parameters as well, as is the case in other applications of NNs. Hyper-parameters, such as the number of layers and the activation functions in every inner NN, can also be tuned and tested offline before implementation., using a variety of available tools, e.g., , . Note that for the simple topology of a single hop link, NNC reduces to deep JSCC in  with soft control on transmission power.
Iv Performance Evaluation
We studied the performance of NNC by experimenting with multi-casting an MNIST image  over a butterfly network, as shown in Figures 2,3. In this setup, there are two source nodes () and two destination nodes (). A normalized MNIST image, with pixel values between and , is split between the two source nodes, such that each source node observes only one half of the image. In other words, out of virtual sources (pixels) are co-located at each source node, where the top pixels are located at the first source node, and the rest at the second. Each link in the butterfly network consists of independent parallel AWGN channels (), with zero-mean noise of variance . We experimented over different power constraints, resulting in different SNR over the links. Performance is evaluated by the peak signal to noise ratio (pSNR) at each destination node, defined as
where is the largest value of input pixels. The choice of pSNR as a distortion measure is natural for images . The pSNR is essentially a normalized version of the MSE and can be used for performance comparison between inputs with different pixel ranges.
In each experiment, we learnt a NNC policy with every inner NN set to be two-layer fully-connected with activation function ReLU:. Note that the hyper-parameters here may not be optimal, but the results still serve as a proof of concept. A NNC policy is trained to minimize
where is the binary cross entropy between the original image and the reconstructed image at destination node . Note the dependence of and on is omitted in the expression for simplicity. We use binary cross entropy , defined as
in the objective function instead of pSNR as an engineering tweak to speed up the training process. Each NNC policy is learnt through training over 60000 MNIST training images for 150 epochs, and is tested on 10000 MNIST testing images. Note that the training set and test set are disjoint. We implemented the NN architecture in Keras
with TensorFlow backend. Adadelta optimization framework  is used with learning rate of 1.0 and a mini-batch size of 32 samples.
In our experiments, we studied power-distortion trade-off of NNC under a variety of network conditions. Different network condition is enforced by different choice of : The higher is, the less power is expected to be sent on link . We call a link in the network “weak” if transmission on the link suffers from lower SNR compared to other links. Weak links are denoted by dashed arrows in the diagrams. We say a node has weak transmitter/receiver if all its outgoing/incoming links are weak. The nodes and links in the network are “equally strong” if power sent on all links are penalized with the same . We first qualitatively studied NNC’s performance in heterogeneous networks. We then quantitatively studied NNC’s performance in homogeneous networks. We analyzed its power allocation strategy, and demonstrated the benefit of allowing non-linearity in network code construction and having a joint coding scheme through comparison.
Iv-a Heterogeneous Networks
Our first set of experiments studies the performance of NNC under a variety of network conditions. For each network condition, we visualize in Figures 2,3 the power-distortion trade-off by showing an instance of test image reconstruction.
Figure 2 shows the transmission of an MNIST image over a butterfly network where all nodes and links are equally strong. Power sent on every link () is equally penalized with a small penalization weight in this experiment, resulting in high SNR on all links, so image reconstruction at both destination nodes is successful.
Figure 3(a) shows one network where the top destination node has a weak receiver. Incoming signals to the top receiver suffer from a low SNR. As a result, the top destination node reconstructs the image poorly, while the bottom destination node performs a much better reconstruction.
Figure 3(b) shows another type of such network where only the top link is weak. As a result, the reconstruction of the upper half of the image at the top destination node is poorer than that at the bottom destination node. Both destination nodes reconstruct the lower half of the image well, as both links from the bottom source node have high SNR. Note that the middle link carries enough information about both halves of the image to allow the top destination node to partially reconstruct the top half of the image. This is a result of the network adjusting to the top link suffering from low SNR, and therefore uses its power in other links.
The last example of a heterogeneous link conditions is depicted in Figure 3(c). In this network, the top source node has a weak transmitter and both outgoing links from the top source node suffer from low SNR. As a result, both destination nodes cannot reconstruct the upper half of the image well. Note that both destination nodes reconstruct well the lower half of the image, as SNR on both outgoing links from the bottom source are high. Furthermore, we notice that the bottom destination node performs better on the top half of the image than the top destination node. This might be explained by the bottom destination node being able to infer the top half of the image better than the top destination node with knowledge from the lower half of the image, as the two halves of the image are correlated.
Iv-B Homogeneous Networks
Our second set of experiments studies the power-distortion trade-off of NNC on a homogeneous butterfly network, where all nodes and links are equally strong. Note that since the noise on each link is fixed, this is equivalent to studying the SNR-distortion trade-off. The transmission power per image is implicitly controlled by the value of . As expected, the quality of the reconstruction improves as transmission power increases, from Figure 4(a) (low power) to Figure 4(b) (medium power), and finally to Figure 4(c) (high power). Note that when the transmission power is almost forced to be zero, as shown in Figure 4(a), both destination nodes reconstruct the average of training data. It is essentially the best possible reconstruction as no information flows from the sources to destinations. In addition, Table I illustrates the power allocation of NNC with different power budget. When limited on power, i.e., is large, NNC prefers to send information on less channels with higher SNR rather than spreading energy over more channels. Indeed, by not allocating power to some channels, the power budget can be used to improve the quality of the channels which carry information. This is in line with the intuition from the water-filling algorithm.
We compared the performance of NNC with two baseline methods. The first competitor is a linear analog of NNC, the Analog Network Coding scheme (ANC): each node amplifies and forwards the sum of its inputs. All amplification factors are the same and the destination nodes decode knowing the amplification factor and the network topology. Note that a MNIST image can be sent by NNC in one-shot, but has to be sent over the network in transmissions by ANC, as there is no compression scheme in the ANC baseline and thus at most pixels can be sent in a single transmission. All distortion in the reconstruction comes from the noise in the channel under the ANC baseline.
Figure 5 compares the performance of NNC and the ANC baseline. Transmission power in ANC is controlled by amplification factor. The average transmission power and pSNR at both destination nodes in Figure 5 are averaged over 300 runs over the test set. Note that performances of both scheme are “symmetric” at destination nodes, reconstructing images with similar pSNR. Such “symmetric” performance can be expected since the network is homogeneous. Overall, NNC outperforms the ANC baseline when transmission power is low, and the ANC baseline outperforms NNC when transmission power is high. This is consistent with  which shows that ANC is in fact capacity-achieving in the high-SNR regime.
The second competitor, the JPEG baseline, is a scheme that separates source coding from network coding: images are compressed through the JPEG compression algorithm at source nodes, and are then transmitted by capacity-achieving network codes through error-free channel codes over the network. Distortion in the reconstruction under the JPEG baseline only comes from compression in source coding, as the transmission is assumed to be error-free. Notice that the JPEG baseline has potentially a high latency due to error-free channel coding, which operates on a large block length.
In our experiments, the JPEG baseline reconstructs high quality images with impractically high power. With the JPEG baseline, average pSNR between a reconstructed image and the original image ranges from 17 to 46. However, the minimum power threshold for using the JPEG competitor is tremendously high as per image. The need of such high transmission power is explained by JPEG algorithm hardly compressing MNIST images. Before compression, each half MNIST image is 392 bytes. After compression, the average file size of half images ranges from 350 bytes to 580 bytes for different qualities. For small images like MNIST, the overhead of the JPEG algorithm is significant. The same problem exists for other compression schemes like JPEG2000. Insufficient compression by JPEG is a representative example of how traditional schemes may lack the ability to adapt their rates to different communication scenarios.
In this paper, we proposed a novel way of constructing network codes using NN. Our scheme, NNC, provides a practical solution for multi-casting arbitrarily correlated signals over networks of arbitrary typologies in an application-specific manner. NNC can be easily learnt and tested offline and implemented in a distributed fashion. We examined the performance of NNC under a variety of network conditions through experiments.
Three possible extensions of the problem arise naturally. First of all, a separate NNC policy can be learnt for each task when multiple tasks exist asynchronously in a system. Thanks to the simplicity of NNC implementation, multiple functions can be implemented at one node. Signals can be designed with a flag piggybacked to trigger task-specific functions for constructing and decoding network codes at each node. Second, NNC can also be extended with little effort to the theoretically hard case when each destination node has to reconstruct a subset of the sources [28, 24]
; in the multi-casting problem all source signals must be reconstructed at every destination node. Third, a functional variation of NNC can be applied when the destination nodes are interested in functions of the input, rather than the input itself. For example, in aforementioned experiments, destination nodes can be interested in classifying which hand-written digit is sent, rather than reconstructing the image itself. These extensions are not specifically discussed in this paper due to space constraints.
Future work includes testing NNC’s performance with a variety of source types, network topologies and channel models. For example, an erasure channel can be readily implemented by a drop-out NN layer. Previous studies on point-to-point transmission, such as text transmission in  and transmission over a multi-path channel in , can be extended with NNC scheme to transmission over a network. An additional direction could be extending NNC over elements drawn from a finite field (as opposed to real numbers), which would allow NNC to be used in a digital domain. Previous works on the quantization of NNs, e.g., , can be referred to for this extension.
The authors would like to thank Yushan Su and Alejandro Cohen for their technical help and constructive comments.
TensorFlow: large-scale machine learning on heterogeneous systems. Note: Software available from tensorflow.org External Links: Cited by: §III, §IV.
-  (2019) Deep joint source-channel coding for wireless image transmission. IEEE Transactions on Cognitive Communications and Networking. Cited by: §I, §III.
-  (2015) Keras. Note: https://keras.io Cited by: §III, §IV.
-  (2005) Towards practical minimum-entropy universal decoding. In Data Compression Conference, pp. 33–42. Cited by: §I.
-  (2009) Introduction to algorithms. MIT press. Cited by: §II.
-  (2012) Elements of information theory. John Wiley & Sons. Cited by: §II.
-  (1982) Linear codes for sources and source networks: error exponents, universal coding. IEEE Trans. Inf. theory 28 (4), pp. 585–592. Cited by: §I.
-  (2018) Deep learning for joint source-channel coding of text. In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing(ICASSP), pp. 2326–2330. Cited by: §I, §V.
OFDM-autoencoder for end-to-end learning of communications systems. In Proc. IEEE Int. Workshop Signal Proc. Adv. Wireless Commun. (SPAWC), Cited by: §I, §V.
-  (2016) Deep learning. MIT press. Cited by: §I.
-  (2006) A random linear network coding approach to multicast. IEEE Trans. Inf. theory 52 (10), pp. 4413–4430. Cited by: §I, §I, §III.
-  (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In , pp. 2704–2713. Cited by: §V.
-  (2003) An algebraic approach to network coding. IEEE/ACM Transactions on Networking (TON) 11 (5), pp. 782–795. Cited by: §III.
-  (1998) The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/. Cited by: §IV.
Minimum-cost subgraphs for joint distributed source and network coding. In Proc. NETCOD, Cited by: §I, §III.
-  (2009) Practical source-network decoding. In 2009 6th International Symposium on Wireless Communication Systems, pp. 283–287. Cited by: §I.
-  (2010) Analog network coding in the high-SNR regime. In 2010 Third IEEE International Workshop on Wireless Network Coding, pp. 1–6. Cited by: §IV-B, §IV-B.
-  (2017) An introduction to deep learning for the physical layer. IEEE Transactions on Cognitive Communications and Networking 3 (4), pp. 563–575. Cited by: §I.
-  (Dec. 2016) Learning to communicate: channel auto-encoders, domain specific regularizers, and attention. In Proc. of IEEE Int. Symp. on Signal Processing and Information Technology (ISSPIT), pp. 223–228. Cited by: §I.
-  (1994) Communication systems engineering. Vol. 2, Prentice Hall New Jersey. Cited by: §IV-B.
-  (2006) Separating distributed source coding from network coding. IEEE/ACM Trans. Netw. 14 (SI), pp. 2785–2795. Cited by: §I.
-  (1973) Noiseless coding of correlated information sources. IEEE Trans. Inf. theory 19 (4), pp. 471–480. Cited by: §I.
-  (2001) Network information flow-multiple sources. In Proceedings. 2001 IEEE Int. Sym. Inf. Theory, pp. 102. Cited by: §I.
-  (2003) Zero-error network coding for acyclic networks. IEEE Trans. Inf. theory 49 (12), pp. 3129–3139. Cited by: §V.
-  (1992) The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38 (1), pp. xviii–xxxiv. Cited by: §I.
-  (1999) Fractal and wavelet image compression techniques. SPIE Optical Engineering Press Bellingham, Washington. Cited by: §IV.
-  (2009) On practical design for joint distributed source and network coding. IEEE Trans. Inf. theory 55 (4), pp. 1709–1720. Cited by: §I, §III.
-  (2007) The capacity region for multi-source multi-sink network coding. In 2007 IEEE Int. Sym. Inf. Theory, pp. 116–120. Cited by: §V.
-  (2012) ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701. Cited by: §IV.