I Introduction
Optical wireless communication (OWC), which exploits terahertz spectra, has been regarded as a promising solution for enabling much higher data rate in the fifth generation (5G) communication systems [1]. In OWC, solidstate optical sources such as lightemitting diodes (LEDs) are used to convey information to receivers equipped with photodiodes (PDs) and serve as lighting sources at the same time by switching optical pulses at a very high rate such that human eyes cannot perceive. Compared to radio frequency (RF) communication, a more costeffective deployment is possible for OWC by utilizing existing lighting infrastructures and leveraging spectrum which does not require authorized access.
In practical implementation of optical wireless systems, the average intensity of LEDs is controlled for achieving energy savings and safety [1]. In particular, the average intensity of optical pulses in visible light communication (VLC) applications is closely related to color and luminance of LEDs which are essential features for user requirement satisfaction. Thus, the optical signal modulation and demodulation strategies should be designed to fulfill arbitrary intensity constraints. Various optical modulation techniques have been developed with the objective of improving spectral efficiency [2] and maximizing the minimum distance among constellation points [3, 4]. However, the optimal design with respect to endtoend error rate performance under generic lighting constraints still remains unaddressed. Another critical issue is the optical shot noise induced by random nature of the photon emission of LEDs. The statistics of the optical shot noise depends on the transmitted LED intensity, and thus the application of transceiver design methods intended for RF communication results in a performance loss.
To tackle such nontrivial challenges, this article presents deep learning (DL) techniques that identify an efficient optical transceiver pair. In particular, an unsupervised learning framework based on autoencoder (AE)
[5] is investigated to design an optical wireless system. The AE techniques have been recently adopted to RF communication designs [6]. However, it is not straightforward to bring the machine learning structure in
[6]into the OWC design since the effect of lighting constraints has not been properly studied. Therefore, additional processing is required to control behaviors of neural networks (NNs) to construct OWC systems. Since most stateoftheart DL techniques have focused on unconstrained problems in classification and generative model applications, it is highly challenging to introduce complicated constraints into the NNs in general.
This article provides an overview of recent DL approaches for various optical wireless setups such as multicolored systems and onoff keying (OOK) based OWC. Subsequently, a convolutional AE (CAE) structure is proposed for image sensor communication (ISC) where the information is conveyed by spatially separated LED arrays and a receiver is implemented with an optical image sensor. Finally, concluding remarks and implementation challenges for DLbased communication systems are addressed.
Ii Deep Learning Framework for OWC Systems
Iia AE Basics
Figure 1 shows an AE which consists of an input layer, multiple hidden layers, and an output layer. The objective of the AE is to find efficient encoding and decoding rules for a given training set without any prior knowledge. The AE framework focuses on an accurate reconstruction of input by its output obtained from successive NN computations. To learn the encoding and decoding rules effectively, the dimension of hidden layers of a typical AE first decreases and subsequently increases after a certain layer (e.g., hidden layer 2 in Fig. 1). Thus, the AE can be regarded as a cascade of two consecutive NNs: an encoding network and a decoding network. The output of the encoding network can be interpreted as a codeword, while the decoding network produces a reconstruction from the codeword such that the output becomes similar to the input.
Each hidden layer performs a linear operation on an input with a trained set of weight matrices and bias vectors and applies a nonlinear activation at the result of the linear operation to yield the final output. The reconstruction is then attained at the output layer via a similar computation process. The activation is an important feature in DL which introduces nonlinearity to NNs so that complicated inputoutput relationships can be effectively learned. Popular candidates include rectified linear unit (ReLU), sigmoid, and softmax
[7].During the training step of the AE, the weight matrices and the bias vectors of the hidden layers and the output layer are trained to minimize a cost function of the AE, which assesses the affinity between the input and the output for successful reconstruction processes. For continuousvalued inputs, the mean square error is typically adopted for the cost function, whereas the crossentropy works well in classification applications with binary inputs. The minimization of the AE cost function is, in general, a nonconvex optimization problem where no closedform solution is available. To train an NN, most stateoftheart DL techniques employ a stochastic gradient descent (SGD) algorithm, which is a variant of the gradient descent methods
[7]. Once the AE is trained, its performance is evaluated over a test set whose elements are not seen during the training step.IiB Applications to OWC Systems
Figure 1 illustrates a generic AE framework for OWC systems. The AEbased optical transceiver is implemented with the encoding and the decoding networks which have been trained in advance. A message index in the message set is first mapped to a vector representation to be processed by multidimensional hidden layers. The representation vector is passed into the encoding network at the transmitter, and an optical controller, in turn, refines the output of the encoding network to produce feasible optical signals of length for each message . The output dimension of the transmitter corresponds to the number of LEDs or the symbol duration. The optical controller can be realized by either deterministic computation or additional NN that is trained along with the encoding and the decoding networks.
The optical channel can be characterized by two different types of noise sources: ambient noise and shot noise. The ambient noise is independent of the transmitted signal and is assumed as a Gaussian random variable with zero mean and variance
. On the other hand, the shot noise is induced by the random nature of photon emission of LEDs and its variance is proportional to the channel input. Thus, for the channel input , the shot noise variance is modeled as , where stands for the shot noise scaling factor. Finally, the outputobtained from the decoding network at the receiver is an estimate of the transmitted message
.In AE techniques which have been recently presented for RF communication system designs [6], transmit power constraint of practical RF hardware is fulfilled by a simple deterministic normalization in the twodimensional (2D) RF signal space [6]. In contrast, lighting constraints in OWC are normally interpreted by polyhedrons in a multidimensional space, which forms a more complicated optical signal space compared to the RF communication. Furthermore, some optical systems such as an OOKbased OWC system are subject to nonconvex optical constraints. Therefore, handling the behavior of the encoding network based on simple computations in [6] is not straightforward for the OWC systems. As a result, the design of the optical controller is a key challenge for the AEbased OWC optimization.
Iii Deep Learning Based OWC Design
In this section, recent technical progresses of the AE methods in OWC transceiver design problems are presented.
Iiia Multicolor Modulation
A VLC system is one of a multicolored OWC and consists of multicolor LEDs with color chips which send messages to a receiver with corresponding PDs. Different color filters are utilized at the receiver to separate optical signals according to the corresponding color. Each message is modulated to an optical constellation point in the dimensional color signal space. An optical space is specified by constraints on the nonnegativity and the peak intensity for each color dimension. Also, the average intensity of the optical modulation signal should meet the dimming target which is associated with color and intensity requirements of users.
The AE framework has been applied to the multicolored VLC systems in [8] for identifying a reliable message recovery technique. To add the dimming support, a deterministic postprocessing computation is carried out at the end of the encoding network. A convex optimization formulation is employed to project the output vector of the encoding network onto the feasible optical space. Based on a closedform projection solution, a lowcomplexity implementation is possible for training numerous samples.
The optical channel is adopted with a stochastic noise layer where a randomly generated Gaussian noise vector is added to the transmitted signal. Then, the decoding network receives the noisy signal as an input and performs a classification task with the softmax output activation. This produces a probability vector where each element characterizes the probability of the corresponding message being transmitted. The cost function is set to the categorial crossentropy function between the input message and the output probability vector for the classification task of the transmitted message. Thus, the training set of the AE consists of a large number of messages and optical noise vectors. In the training step, the AE learns dimming features as well as statistical properties of the optical channels by itself to minimize the classification error. It has been noticed that in the signaldependent shot noise channels, the AE method performs better than classical minimum distance maximizing approaches
[3] in terms of the average symbol error rate (SER) performance and effectively mitigates the effects of intercolor interference induced by the imperfection of received color filters.IiiB OOK Modulation
In OOKbased OWC, each LED either turns on or off to convey a binary message. Thus, a message is encoded by a binary optical signal whose average intensity is controlled by adjusting the number of ones in the binary vector. This requires a computationally demanding search for designing constant weight codes (CWCs), which consist of binary codewords with identical Hamming weight. Although the mathematical properties of the CWCs have been intensively studied [4], identifying the optimal encoding and decoding rules for the CWC still remains open in general configurations.
In [9], a binary AE training approach has been presented for an OOKbased OWC where a message is conveyed through temporal intensity change of a single LED. Thus, the output dimension of the transmitter indicates the symbol duration, which is the length of the binary codeword. To restrict the number of ones in the binary codeword of each message , the penalty term is augmented to the AE cost function, where a positive number represents a tradeoff parameter controlling the portion of the penalty term in the cost function and stands for the target average intensity.
Since the binary constraint is nonconvex, deterministic operations including linear projections in [8] are no longer applicable to the OOK systems. A naive approach for generating binary outputs would be to employ a hard binary activation such as a unit step function. However, the gradient of the hard binarization function is zero for all input range and must incur a wellknown vanishing gradient problem [7], which is a notorious issue in training deep NNs handling discrete variables. Hence, the weights and the biases of the AE do not get updated with the SGD algorithm and are typically stuck with poor performance.
To overcome this difficulty, a soft binarization technique has been adopted in [9]
which gradually anneals a continuousvalued latent vector into an OOK signal during the training step. At the end of the encoding layer, a parameterized sigmoid function
is utilized as the activation function, where a positive number
is related to the tangent of the sigmoid function. As illustrated in Fig. 2, the parameterized sigmoid function approaches the hard binary activation as becomes larger. Thus, the hardness of the AE is controlled by adjusting the parameter .For a moderate regime of
, the parameterized sigmoid function has a nonzero gradient, implying that the AE can be efficiently trained via the SGD algorithm. To avoid the vanishing gradient problem with a large value of
, a multistage training strategy has been introduced in [9] which sequentially trains the AE with a different at each stage. The parameter is gradually incremented at each stage so that the training performance converges to an effective point without the vanishing gradient issue.Figure 2 depicts a multistage training strategy. At each stage, weights and biases of the AE are trained with fixed using the SGD algorithm until convergence. Upon the training completion, the value of is incremented for training at the next stage. Thus, the SGD algorithm at the current stage warmstarts from the AE trained at the previous stage. It can be viewed as a cascaded finetuning strategy of an AE. Finally, the value of at the last stage becomes a sufficiently large number such that a binary output is produced. The DL approach for the OOKbased OWC outperforms traditional minimum Hamming distance maximization designs [4] in terms of the average SER over the shot noise channels.
Iv Deep Learning Framework for Image Sensor Communication
The DL approaches reviewed in the previous section were developed for an optical receiver with a single PD which forms parallel singleinput singleoutput channels. This section proposes the extension of such results for an ISC system where a receiver is realized by a image sensor which consists of multiple PDs. Thus, the ISC can be regarded as a multipleinput multipleoutput optical wireless system. The use of the image sensor as an OWC receiver has been intensively studied in recent years, and its standardization has been under way in Optical Wireless Communications Task Group [10]. As the image sensors are able to separate lighting sources spatially, the reliability and the capacity of the OWC can be enhanced by exploiting a 2D square array of transmit LEDs and high frame rate image sensors [11].
For OOK systems, the transmitter employs spatial modulation techniques for determining binary transmit LED intensity. Message is encoded using an by matrix that maps a 2D OOK modulation symbol. Decoding the transmitted message relies on the image captured by a by image sensor. This requires the joint optimization of 2D OOK modulation rule and image decoding process over a signaldependent optical channel.
The feasibility of the ISC systems has been investigated in indoor scenarios [12] and outdoor vehicular communication applications [11]. However, the control of the average intensity for the LED arrays, such as dimmable transmitter optimization for the VLC, has not been adequately investigated with the target of the SER minimization. Since highresolution cameras are adopted in [11] and [12], a naive image processing technique, which subtracts the current image from the previous one, suffices to detect on and off symbols conveyed by each LED. By contrast, for a practical lowresolution CMOS image sensor, such an approach would not be possible as it suffers from LED irradiance spread and lens blur.
Other difficulties arise due to the randomness of the ISC channel, in particular, the imperfect alignment between the transmit LED array and the receive image sensor incurring random rotations in the received image. To address this issue, [12] utilized dummy LEDs, which always emit the same OOK intensity pattern, so that the misalignment can be compensated at the receiver via simple image processing. However, this fails to guarantee arbitrary LED intensity control and results in the degraded spectral and energy efficiency. Generally, designing the ISC transceiver which is robust to the random nature of the optical channel is a highly challenging problem, in particular, when perfect channel knowledge is not available.
Iva Convolutional Autoencoder
To overcome the implementation issues in the ISC system, we propose a CAE structure as illustrated in Fig. 3. In the CAE, several hidden layers are implemented with convolutional layers, which have proven powerful in handling a 2D image input [7]. Unlike onedimensional (1D) fullyconnected layers in Fig. 1 where all the elements of an input vector contribute to a hidden layer output, convolutional layers accept matrices as inputs and apply weight matrices only to adjacent elements to produce a 2D output. As depicted in Fig. 3, this can be viewed as a 2D convolution operation which slides a 2D window filter of the same weights over the input matrix. This computation helps extracting spatiallycorrelated features of the image input such as edges and lines. More complicated features can be learned with the aid of multiple convolution filters having different weights, which provide several 2D output matrices. A pooling layer can be added to a convolutional layer to reduce the output dimension by sampling one element over the predefined 2D region. With the pooling layers, NNs become robust to minor spatial changes in the input image [7]. Popular choices for the pooling are the maximum and average operations.
At the encoding network of the CAE, the message is first mapped to an onehot vector [6], which is a zero vector except for the th element equal to , and then is processed by several fullyconnected layers. To yield a 2D OOK intensity matrix, convolutional layers are adopted to an output vector of the fullyconnected layers by reshaping into a matrix. Each element of the output matrix of the encoding network is mapped to the OOK transmit intensity of each LED. Through the optical channel, which contains the signaldependent noise as well as random image rotation and blur effects, the receiver obtains 2D images capturing the transmit LED array at each data transmission. The decoding network, which includes multiple convolutional layers followed by fullyconnected layers, retrieves the transmitted message from the received image.
For implementing the OOK modulation, the parameterized sigmoid activation is adopted at the end of the encoding network with the aid of the annealingbased multistage training strategy [9]. To control the average intensity of the binary optical signal, the regularization term , which evaluates the deviation of the average intensity from the by target intensity matrix , is added to the categorial crossentropy cost function. Following accurate mathematical ISC channel model in [13], training samples can be readily generated. To compensate the random rotation effect, the ISC channel is randomly generated with arbitrary rotated LED array coordinates. To be specific, a random rotation angle is applied to training samples so that the CAE can efficiently extract the features regarding the random rotation by itself. On the other hand, the trained CAE does not require the rotation angle in the testing step. Thus, the proposed CAE transceiver can be implemented in a practical scenario where channel state information (CSI) is not available in advance.
IvB Implementation Details
Encoding network  
Layer  Activation 


Fullyconnected  ReLU  by  
Fullyconnected  ReLU  by  
Convolutional ( filters, by)  ReLU  by  
Maxpooling (by)    by  
Convolutional ( filters, by)  ReLU  by  
Maxpooling (by)    by  
Convolutional ( filter, by)  Parameterized sigmoid  by  
Decoding network  
Layer  Activation 


Convolutional ( filters, by)  ReLU  by  
Maxpooling (by)    by  
Convolutional ( filters, by)  ReLU  by  
Maxpooling (by)    by  
Fullyconnected  ReLU  by  
Fullyconnected  Softmax  by 
A square LED array of size by is considered with a white LED RL5W4575 [13]. The interLED distance is fixed as , and the distance between the transmitter and the receiver is given by . The receive image sensor adopts a GigE monochrome 1/4 inch Sony CCD [13] with the resolution of by pixels, each of which has a square shape of size by. The lens focal length and the fnumber are set to and , respectively. The perfect synchronization is assumed between the transmitter and the receiver. The spectral efficiency of bits per channel use is assumed with messages and the shot noise scaling factor is equal to .
The proposed CAE structure is illustrated in Table I
. At each layer, the batch normalization layer is added for efficient training
[14]. Total samples are employed for the training, and another randomly generated samples are used for the validation step for finding the regularization parameter which achieves a good tradeoff between the validation SER performance and the dimming feasibility. The test performance of the trained CAE is evaluated with samples. The Adam algorithm [15] with the learning rate is utilized for the training of total stages. To capture the signaldependent property, two different signaltonoise ratio (SNR) values, and , are considered in the training step, each of which is employed in the testing for the low SNR and the high SNR regimes, respectively. A CAE training step is an offline process. Once trained, realtime operations of the encoding network are carried out in a lookup table that maps a message to the corresponding OOKmodulated symbol. The calculation of the decoding layer is realized by linear algebraic operations of the complexity , which is comparable to maximumlikelihood (ML) detection given by .IvC Numerical Results
Numerical results for the trained CAE is presented by evaluating the average SER performance over the testing set comprised with unseen rotation angles and noise. Figure 4 plots the average SER performance of the proposed CAE transceiver as a function of the SNR. All the elements of the target intensity matrix are set to
. Two different scenarios are adopted for the CAE. First, the CAE is trained and tested without the rotation to provide reference performance. Second, an arbitrary rotation angle, which is uniformly distributed over
, is applied both in the training and the testing steps of the CAE. For comparison, the following reference approaches are considered.
Fullyconnected AE (FAE): The AE only with fullyconnected layers is employed with the same number of layers and dimensions as the proposed CAE structure.

Baseline: The transmitter utilizes randomly generated OOK satisfying the target average intensity. Then, the ML decoding is applied to the receiver.
It is noted that blind detection is possible both for the CAE and the FAE without the CSI at the receiver. In contrast, the baseline technique relies on perfect CSI for ML decoding at the receiver. Thus, for fair comparison, the performance of the baseline method is also evaluated for imperfect CSI cases with different level of channel estimation errors. Figure 4 shows that the proposed CAE outperforms the baseline scheme. The CAE learns an efficient encodingdecoding rule by observing numerous ISC channels during training, whereas the transmitter and the receiver in the baseline approach are developed separately for a given CSI. It is interesting to see that the CAE performs better than the FAE for all SNR range. This implies that 2D convolution operations at the encoding and the decoding of the proposed CAE are powerful for learning 2D OOK spatial modulation rules as well as image decoding strategies for the ISC systems. Also, the CAE trained with the random rotation is shown to be robust to the random ISC channel effects, since it provides substantial SER gains over the baseline method with perfect CSI.
V Conclusions and Future Works
This article has introduced DLbased design directions for OWC systems to overcome implementation difficulties stemmed from nontrivial lighting constraints and impairment of optical channels. Recent DL approaches for OWC transceiver optimization have been reviewed and their technical contributions have been discussed. Also, the CAE framework has been proposed for designing ISC systems where a receiver with an image sensor captures the image of a transmit LED array for decoding. Numerical results have confirmed that the proposed CAE provides a substantial performance gain over the baseline approaches even without the CSI. Some future research directions are summarized in the following.
Va DimmingAware Neural Network Construction
Current AE methods for VLC design have focused on satisfying a specific dimming constraint, resulting in high computations for training multiple AEs for all possible dimming values. For practical VLC with arbitrary dimming requirement, dimmingaware AE structures which have adaptive dimming control abilities should be investigated. One possible approach is to accept dimming target as an input feature of an NN. With numerous dimming samples, the trained network will be able to support arbitrary dimming requirements via a single training process.
VB Extension to Screen Modulation Systems
A screen modulation technique [10], which conveys a message masked by screen images, is an interesting future research topic. In this case, NNs are trained to modulate color, shape, or intensity of pixels while producing the target image via the RGB intensity control of the screen. Furthermore, a quantization layer with multiple quantization levels, which can be regarded as an extension of the binarization technique for the OOK modulation, is essential to train the NNs for selecting a proper symbol among multiple modulation candidates.
VC Training With Real Measurement Samples and Field Experiments
Existing DL approaches for OWC stem from mathematical channel models. It is questionable whether trained networks work well in a realworld optical environment that includes LED nonlinearity and PD imperfection. Thus, it is necessary to train NNs with the set of measurement samples and validate its performance with field experiments. Recent generative learning techniques such as generative adversarial networks could be exploited to produce high quality of artificial OWC channels based on a small number of measurement samples. The NN is then further trained over these generative samples and its viability can be verified through field experiments.
VD Robust DL Techniques for Communication Systems
Performance of current wireless networks highly relies on perfect knowledge of modulation and coding schemes, CSI, resource scheduling information, and so on. Acquiring such information would be significantly difficult in 5G systems with a massive number of entities such as largescale antenna array systems and internetofthings networks. In these scenarios, developing robust transceivers with imperfect or insufficient prior knowledge is crucial. However, it is not straightforward to handle this issue by existing signal processing methods. Motivated by the results of the proposed ISC systems, the AE technique can be extended to design various wireless systems without the CSI. It would be an interesting future work to investigate a DL framework for developing robust wireless networks where no prior information is available.
References
 [1] M. A. Khalighi and M. Uysal, “Survey on free space optical communication: a communication theory perspective,” IEEE Commun. Surveys Tuts., vol. 16, no. 4, pp. 2231–2258, 4th Quart., 2014.
 [2] S. Zhao and X. Ma, “A spectralefficient transmission scheme for dimmable visible light communication systems,” J. Lightw. Technol., vol. 35, no. 17, pp. 3801–3809, Sept. 2017.
 [3] X. Liang, M. Yuan, J. Wang, Z. Ding, M. Jiang, and C. Zhao, “Constellation design enhancement for colorshift keying modulation of quadrichromatic LEDs in visible light communications,” J. Lightw. Tech., vol. 35, no. 17, pp. 3650–3663, Sept. 2017.
 [4] P. Ostergard, “Classification of binary constant weight codes,” IEEE Trans. Inf. Theory, vol. 56, no. 8, pp. 3779–3785, Aug. 2010.
 [5] Y. Bengio, “Learning deep architectures for AI,” Foundat. Trends Mach. Learn., vol. 2, no. 1, pp. 1–127, 2009.
 [6] T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Trans. Cog. Commun. Netw., vol. 3, no. 4, pp. 563–575, Dec. 2017.
 [7] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, May 2015.
 [8] H. Lee, I. Lee, and S. H. Lee, “Deep learning based transceiver design for multicolored VLC systems,” Opt. Express, vol. 26, no. 5, pp. 6222–6238, Feb. 2018.
 [9] H. Lee, I. Lee, T. Q. S. Quek, and S. H. Lee, “Binary signaling design for visible light communication: a deep learning framework,” Opt. Express, vol. 26, no. 14, pp. 18 131–18 142, July 2018.
 [10] T. Nguyen, A. Islam, T. Yamazato, and Y. M. Jang, “Technical issues on IEEE 802.15.7m image sensor communication standardization,” IEEE Commun. Mag., vol. 56, no. 2, pp. 213–218, Feb. 2018.
 [11] T. Yamazato, I. Takai, H. Okada, T. Fujii, T. Yendo, S. Arai, M. Andoh, T. Harada, K. Yasutomi, K. Kagawa, and S. Kawahito, “Imagesensorbased visible light communication for automotive applications,” IEEE Commun. Mag., vol. 52, no. 7, pp. 88–97, July 2014.
 [12] W. A. Cahyadi, Y. H. Kim, Y. H. Chung, and C.J. Ahn, “Mobile phone camerabased indoor visible light communications with rotation compensation,” IEEE Photonics J., vol. 8, no. 2, Apr. 2016.
 [13] J. PerezRamirez and D. K. Borah, “A singleinput multipleoutput optical system for mobile communication: modeling and validation,” IEEE Photonics Tech. Lett., vol. 26, no. 4, pp. 368–371, Feb. 2014.
 [14] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariance shift,” in Proc. Int. Conf. Mach. Learn. (ICML), pp. 448–456, July 2015.
 [15] D. Kingma and J. Ba, “Adam: a method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.
Comments
There are no comments yet.