14.1 Introduction
The emerging industrial usecases of sixthgeneration (6G) and beyond wireless networks are envisaged to include industrial automation, autonomous vehicles, and smart infrastructure. These applications require significant improvements in data capacity, system latency, and qualityofservice reliability over the current 5G networks. In this context, reconfigurable intelligent surface (RIS) has been identified as a key enabling technology to program the smart radio environment (SRE), increase link quality, and reduce the hardware complexity [55, 26]. The RIS is made up of a metasurface (MTS)  a twodimensional (2D) reconfigurable electromagnetic (EM) layer composed of a large periodic array of subwavelength scattering elements (metaatoms) with specially designed spatial features [28, 51]. Compared to electrically large arrays, the nearly passive metaatoms offer lower cost and power consumption. The radiofrequency (RF) MTS performs customized transformations, such as beamforming, on a reflected incident wave through modified surface boundary conditions using Huygens’ principle. For example, the MTS shifts the reflected phase of incident signal by creating a field discontinuity at the boundary of the surface. The arrangement and subwavelength structure of each metaatom and, in turn, the array of space and timevarying metaatoms determine MTS aperture field distribution and control the direction and strength of reflected signal [22].
In a conventional wireless communication systems, the network optimization has been limited to control at the transmitter and receiver. This paradigm assumes that the wireless fading channel is uncontrollable and is a significant factor limiting the performance because of random signal reflections, diffraction, and scattering in the wireless environment. The RIS overcomes many of the aforementioned fading channel limitations through the ability of MTS to manipulate waves, achieve arbitrary aperture beamforming, and perform realtime analog spatial signal processing. This has spawned novel MTSbased RF applications such as intelligent beamforming [68], anomalous refraction and reflection [5], frequency selective and highimpedance surfaces [58], scattering reduction [59], polarization conversion [74], leakywave antenna [47], surface wave control [45], beam focusing [48], transmitarray antennas [39], reflectarray antennas [4], and holographic imaging [18]. Initial applications of RIS were limited to wireless communications for interference suppression [68], joint wireless information and power transmission [35], physical layer security [49], and multibeam design [64]. However, more recent works have introduced IRS to radar remote sensing [14, 66, 13] and joint radarcommunications systems [67, 10].
In a wireless link, the RIS functions as either an electrically large antenna array at the endpoints or as an amplifyandforward relay (Fig. 14.1
). By actively controlling and optimizing the amplitude/phase of each metaatom across the aperture, the RIS maximizes the receive signaltonoise ratio and provides adaptive beamforming to coherently focus the reflected signal on the receiver. Through joint optimization of the wireless channel and endpoints, RISassisted links are able to realize SRE. Each scattering element typically includes an active tuning element, such as a varactor or PIN diode, whose bias voltage is softwarecontrolled to change the EM response of the surface. The bias voltage for each metaatom is precomputed and modulated by a digital control module employing a field programmable gate array (FPGA)
[24]. Each metaatom is controlled by tuning its EM properties (susceptibility or impedance) which affects the spectral response of the reflected signal. This aids in producing tailored radiation patterns for diverse functions, such as beam steering, anomalous reflection, focusing, beam splitting, absorption, and direct modulation of the reflected signal.There are several challenges in the design, fabrication, deployment, and processing of RIS. In applications such as radar and communications that have precise radiation pattern constraints, the RIS design often involves optimization of several complicated and irregular geometry parameters to meet the required resonant frequency, gain, polarization, bandwidth, and size constraints. The conventional design process could be very tedious. Further, postdeployment, the processing of RIS signals and optimized beamforming is also challenging because of highdimensional nature arising from the use of several antennas. In this context, machine learning (ML) techniques have recently shown unprecedented performance in problems where it is challenging to develop an accurate mathematical model for feature representation. These methods are now also transforming the abovementioned tedious approaches to design RIS and process its signals. In particular, as a class of machine learning techniques, deep leaning (DL) methods have gained much interest recently for solving many challenging problems such as speech recognition, visual object recognition, rainfall estimation, and language processing [36, 1, 70]. These techniques offer advantages such as low computational complexity while solving optimizationbased or combinatorial search problems as well as the ability to extrapolate new features from a limited set of features contained in a training set [36]. Recently, DL for MTS inverse design, wherein a metaatom design if synthesized from a specific response, has become very popular. This has been applied for semiautomated inverse design of metamaterials [43], MTS [72, 54], and nanophotonic structures [53]. Note that the above ML/DL application to MTS/RIS design is different from using DL to perform signal processing functions in RISaided communications (see, e.g., [8] for a survey). In the following, we describe these aspects in detail.
14.1.1 ML/DL for RIS Design
The design and optimization of RIS hardware at the physical layer remains a formidable challenge. To date, RIS/MTS implementations remain quite limited. To realize the promise of RISassisted networks and SRE, more robust and automated MTS design techniques are required. Without capable RIS hardware, the benefit of RISassisted networks will be significantly reduced due to EM limitations. In general, canonical structures such as vantennas, loadeddipoles, splitring resonators, are used to fabricate RIS. However, metaatoms based on these geometries usually fall short of desired performance, particularly when anisotropic, broadband, and/or wideangle responses are required. As a result, traditional MTS design approaches exhibit performance limitations, especially given the complexity of MTS hardware requirements and increasing functionality required for wireless nodes in next generation networks.
Designing a userdefined, arbitrary wavefront RIS or metagrating [41, 32, 25] is a challenging, laborintensive, and long process. In general, a new MTS design entails numerous rounds of manual tuning and fullwave simulations that iteratively solve Maxwell’s equations until a locally optimized design is achieved [41]. Initial designs are typically based on physical instincts and intuitive arguments. However, the final geometric structure and material characteristics are attained through iterative analyses.
The ML/DL approaches expend computational time and resources upfront as a fixedcost to generate training data sets of device geometries and their associated spectral responses but are useful during the predication stage [2]
. Deep neural networks are trained to map the nonlinear relationships between metaatom geometry and spectral response. The power of deep neural networks comes from their multilayered composition which allows them to learn the relationships between data with multiple levels of abstraction
[37]. Once trained, a deep neural network efficiently produces the geometry of a metaatom given a desired spectral response. The application of deep learning to the inverse design of MTS and nanophotonic structures is still in its early stages and much more work required to realize more generalized complex designs, reduce the amount of required training data, and result in increased efficacy. Nearly all of these works rely on supervised learning techniques for metamaterial performance predictions, which map known inputoutput pairs based on large training examples. In MTS design, applying such techniques does not result in new shapes different than the ones used in training. This severely limits the ability to generate customized MTS patterns. In
[25, 22], we introduced the use of generative adversarial networks (GANs) to microwave MTS design that aids in discovering new shapes of metaatoms.
14.1.2 ML/DL for RIS Applications
The nextgeneration millimeter wave (mmWave) massive multipleinput multipleoutput (MIMO) systems require large antenna arrays with a dedicated radiofrequency (RF) chain for each antenna. This results in expensive and large system architectures which consume high power and processing resources. To reduce the number of RF chains while also maintaining sufficient beamforming gains, hybrid analog and digital beamforming architectures were introduced. However, the resulting cost and energy overheads using these systems remain a concern. Recently, RISs have emerged as a feasible solution [19] to implement low cost and lightweight alternative to large arrays complexity in both outdoor and indoor applications, usually with separate operating frequencies or spectral bands. (Fig. 14.2).
The RISs reflect the incoming signal by introducing a predetermined phase shift. This phase shift is controlled via external signals by the base station (BS) through a backhaul control link. As a result, the incoming signal from the BS can be manipulated in realtime, thereby, reflecting the received signal toward the users. Hence, the usage of RIS enhances the signal energy received by distant users and expands the coverage of the BS. It is, therefore, required to jointly design the beamformer parameters both at the RIS and BS. This achieves desired channel conditions, wherein the BS conveys the information to multiple users through the RIS [15]. Different from amplifyandforward (AF) relay systems, an RIS can have both active and passive components, which can provide a flexible configuration, thus, it has less active transmit modules or totally reflects the received signal as a passive surface. Thus, the RIS is much more energy and spectrumefficient [29].
The accuracy of beamformer design strongly relies on the knowledge of the channel information. In fact, the RISassisted systems include multiple communications links, i.e., a direct channel from BS to users and a cascaded channel from BS to users through RIS. This makes the RIS scenario even more challenging than the conventional massive MIMO systems. Furthermore, the wireless channel is dynamic and uncertain because of changing RIS configurations. Consequently, there exists an inherit uncertainty stemming from the RIS configuration and the channel dynamics. These characteristics of RIS make the system design very challenging [38, 15].
To address the aforementioned uncertainties and nonlinearities imposed by channel equalization, hardware impairments, and suboptimality of highdimensional problems, modelfree techniques have become common in wireless communications [6]. In this context, DL is particularly powerful in extracting the features from the raw data and providing a “meaning” to the input by constructing a modelfree data mapping with huge number of learnable parameters. Furthermore, DL is helpful when modeling the channel characteristics thanks to its datadriven structure. A learning model constructs a nonlinear mapping between the raw input data and the desired output to approximate a problem from a modelfree perspective [6]. Thus, its prediction performance is robust against the corruptions/imperfections in the wireless channel data. DL learns the feature patterns, which are easily updated for the new data and adapted to environmental changes. In the long run, this results in in lower computational complexity than a modelbased optimization [15]. DLbased solutions have significantly reduced runtimes because of parallel processing capabilities. On the other hand, it is not straightforward to achieve parallel implementations of conventional optimization and signal processing algorithms [7]. The aforementioned advantages have led to DL superseding the optimizationbased techniques in the RIS system design for physical layer of the wireless communications [6].
14.1.3 Organization
This chapter provides an overview of recent developments in using ML/DL for designing, deploying and processing the physical layer of RIS. The rest of the chapter is organized as follows. In the next section, we discuss various ML techniques for inverse RIS design. Then, we introduce various techniques DL for RIS design in 14.3 and provide a few case studies in Section 14.4. Then, we focus on DLaided RIS applications for wireless systems in Section 14.5. including signal detection and channel estimation. For a more widely used application of RIS beamforming, we discuss various DL frameworks in Section 14.6. We also discuss current challenges in using ML/DL for RIS systems and highlight related future research directions in Section 14.7. We conclude in Section 14.8.
14.2 Inverse RIS Design
Communicationsbased analysis of RIS without physicsbased EMcompliant models is a major limitation of current research. Until recently, prior works did not consider such realistic RIS implementations. As the parameter spaces of metaatom geometry and constituent materials has grown, the conventional approaches to achieve the targeted EM response have become more tedious. In this context, learning models have demonstrated the ability to implicitly learn Maxwell’s equations from training data within a constrained design space. The ML techniques have witnessed increased use in research to create surrogate models for MTS performance prediction, inverse design, and optimization. For an inverse MTS design problem, the input is an arbitrary design spectrum and the network finds or synthesizes a geometry to closely approximate the desired spectral response (Fig. 14.3).
Major benefits of DLbased RIS design for wireless communications include:

EMbased surrogate models: DL constructs a nonlinear mapping between the raw input data (metaatom design) and the desired output to approximate the MTS response.

Inverse design: Deep generative models are utilized to learn geometric features from training data and generate new metaatom designs to achieve the spectral response.

Diverse EM surface representations
: DLbased MTS design admits flexible design representation. The input could be either vectors of discrete parameters describing the geometry, material, frequency, and angular design parameters or pixelated images to represent the geometry or phases of the metaatom design. Whereas a fullyconnected neural network is wellsuited to process the simple designs specified by the former representation, a convolutional networks handle images appropriately to yield more complex MTS geometries.
Algorithm  Frequency  MTS layers  Data  Key features  Drawbacks 

Evolutionary optimization techniques  
GA [2]   GHz  Parameter Vector  Pixelized metaatoms with discrete input design space when a contiguous structure is not required  Optimization from scratch for each design; output structures may be too complex to fabricate  
PSO [2]   GHz  Binary Matrix (2D)  Swarmbased GO technique for pixelized metaatom design; outperforms GA for various EM designs  Optimization from scratch for each design with parameter tuning  
ACO [2]   GHz  Binary Matrix (3D)  MTS, including 3D structures and wire grid arrays, with discrete design space and a contiguous structure  Optimization from scratch for each design; output structures may be too complex to fabricate  
Learning methods  
ANN [53]   THz  Parameter Vector  Performance prediction, inverse design, and optimization of nanophotonic particles  Limited design variables; applicable to only spherical dielectric nanoparticles  
ANN [30]  THz  Parameter Vector  Performance prediction and inverse design of metagratings  Limited set of parametric inputs; significant training overload  
DNN [43]   THz  Parameter Vector  Inverse design of chiral and multilayer MTS  Designspecific architecture; limited design space  
CNN [73]  GHz  Binary Matrix (2D)  Anisotropic digital coding MTS; PSO for beamforming  Significant training overload  
CNN [57]  GHz  Binary Matrix (2D)  Hybrid CNNGA for spacetime modulation of programmable MTS; multibeam steering  Binary phase coding limits beamforming performance; limited tunability  
cDCGAN [41]   THz  Image Matrix (2D)  Generative inverse design of transmission MTS  Significant training overload; limited to single layer designs and passive structures  
cDCGAN [25]   GHz  Image (2D)  Reflective RF MTS; training set with published metaatom structures to improve learning  Limited to single layer; postprocessing required  
cDCGAN [22]   GHz  Image (3D)  Multilayer MTS; RGBstyle matrix to represent multiple layers  No active elements; additional validation required  
cDCGAN [23]   GHz  Image (3D)  Federated learning for multilayer design  Significant training overload  
cDCVAE [44]   THz  1  Image (2D)  Anisotropic MTS; encodes input into lowdimensional latent space  Significant training overload; postprocessing required 
TOGAN [33]   THz  1  Image (2D)  Freeform diffractive metagrating design for select wavelengthdeflection angle pairs with topology refinement  Additional optimization required 
GLOnet [31]   THz  1  Image (2D)  Dielectric MTS design without training sets  Limited to single objective optimization; requires solving Maxwell’s equations inside training loop 
Table 14.1 summarizes prior works on various techniques for RIS inverse design. The nonDL methods typically comprise of several evolutionary optimization algorithms as listed below. The drawback of traditional optimization techniques is that they start from scratch with each new design. This often requires hundreds of additional fullwave simulations per design.
14.2.0.1 Genetic algorithm (GA)
This is an iterative global optimization (GO) algorithm that has been used extensively in the design of pixelated coded MTS designs. GA is a natureinspired algorithm that uses binary strings (chromosomes) to represent candidate designs [2]. During the optimization, the GA selects the best subset of design candidates from the previous generation to serve as starting points for mutation and crossover in the next design iteration. Recent GA applications include coding MTS [2] which demonstrates channel response modification, efficient polarization conversion, and phasegraded beam steering.
14.2.0.2 Particle swarm optimization (PSO)
A popular stochastic evolutionary computation technique, PSO is inspired by the movement and intelligence of swarms. Recently, it has been employed for shaping EM waves using pixelized coded metasurfaces
[2]. The design procedure using PSO is tied to a fullwave EM solver and completely automatic. The software yields both microscopic metaatom designs and the macroscopic aperture coding matrix. By changing the reflection phase difference between cells, this approach has produced designs of functional metasurfaces with circularly and ellipticallyshaped radiation beams and multibeam patterns. This is useful for achieving customized radiation patterns to enhance link performance in the wireless communication channel. Similar efforts have used a simulated annealing algorithm for the design and optimization of a broadband diffusion MTS using anisotropic elements for scattering reduction. In [73], binary PSO (BPSO) was used to automate the macroscropic layout of both passive and active aperture to realize userdefined dualbeam scattering radiation patterns. For example, this study used BPSO to realize a reflecting MTS with a lefthanded circular polarization (LHCP) beam and a righthanded circular polarization (RHCP) beam. Results of this study were experimentally verified. This digital coding approach has been applied to both passive and active RMTS.14.2.0.3 Ant colony optimization (ACO)
This is another swarmbased algorithm inspired by stigmergy in ant colonies in order to search for optimal solutions to graphbased problems [2]. Here, a number of artificial ants build solutions to an optimization problem and exchange information on their quality using a cooperation scheme similar to that utilized by real ants. In [2], inverse MTS design is performed based on multiobjective lazy ACO (MOLACO) to synthesize 3D nanoantenna geometries with lowloss transmission performance and broad phase tunability. The ACO is generally most useful for a discrete input design space and when a contiguous structure is required.
14.3 DLBased Inverse Design and Optimization
The computational power and time required for evolutionary optimization algorithms grow exponentially with the number of design parameters. This is mitigated by DLbased inverse design for RIS. Prior works have employed a variety of network structures and algorithms based on the availability of data, RIS topology, and desired EM spectral response.
14.3.1 Artificial Neural Network (ANN)
The ANNs were first used to approximate light scattering by multilayer nanoparticles (metaatoms) [53]. Similar to MTS, nanophotonic particles derive their frequency response from physical structure and the size constituent scatterers. Then, [30]
used a similar technique for metagratings. Typical inverse design problems require optimization in highdimensional space, which involves lengthy calculations and are typically solved using genetic algorithm or adjoint methods. However, the computational power and time required for GA optimization grows exponentially with the number of design parameters.
The primary application of ANNs in MTS design is performance approximation. The feedforward ANN is trained to be a highfidelity surrogate model for performance prediction. Using training data consisting of metaatom physical design parameters as inputs and frequency response as labels, the ANN is trained to approximate a complex physics simulation (such as finiteelement method (FEM), method of moment (MoM), or finitedifference timedomain (FDTD) simulation). Through the training data, the ANN learns to map the scattering function of the metaatom into a continuous, higherorder space where the derivative is found analytically through propagation. In
[53], a trained ANN simulated spectral responses orders of magnitude faster than conventional fullwave simulations. This study used a fullyconnected ANN consisting of four layers with neurons per layer resulting in 239,500 parameters. The inputs were the thickness of each metaatom layer (the materials were fixed) and the outputs were the spectrum sampled at points between and nm. The results suggest that the ANN was not simply fitting the data, but rather discovered the underlying structure of inputtooutput mapping to generalize the physics of the systems with the training set and solve problems not yet encountered.A significant drawback of this approach is that the inputs are limited to the thicknesses of the metaatom layers with fixed materials. This results in a lack of generalizability for the ANN that vastly limits the possible metaatom design structures. While fixing the input parameters reduces the complexity of the ANN architecture, it limits the design space and optimal designs. Another drawback of this approach is that [53] required examples using conventional simulation methods to generate training data. However, unlike evolutionary optimization methods such as GA or PSO, simulation of the training dataset is an upfront fixed cost because it only needs to be simulated once and is then leveraged for other designs. Additionally, the simulations for training data generation are highly parallelized unlike serial optimization techniques.
Once trained, [53] shows that the ANN solves inverse design problems more quickly than than its numerical counterparts because the gradient is found analytically, through back propagation, rather than numerically. Similar to inverse design, the ANN also optimizes for a desired property by altering the cost function used for the design without training the ANN. Their results that the ANN performs inverse design and optimization more accurately than traditional numerical nonlinear optimization techniques.
14.3.1.1 Deep Neural Networks (DNN)
To model more complex metaatom structures and increase performance prediction accuracy, DL has been applied to the ondemand design of chiral (a form of anisotropy) MTS [43]. Here, deep neural networks (DNN)  an ANN comprised of many hidden layers to significantly expand learning and generalization ability  was employed to automatically design and optimize 3D chiral MTS with strong anisotropic spectra at predetermined wavelengths. The network comprised two bidirectional networks that were constructed using partial stacking technique. This study limited the input design space (and hence the structures obtained) and predicted the reflection spectral response at discrete frequency points for two orthogonal polarization and the crosspolarization coupling term resulting in a by spectral output vector. By fixing the inputs to be five specific design parameters, this DNN design approach is also limited in its generaliziblity to other physical structures in the design space. Fullwave simulation was used to generate the training data set for example metaatoms. The DNN achieved high efficiency and highaccuracy for performance prediction and inverse design for anisotropic MTS, where the metaatom design space is limited.
14.3.2 Convolutional Neural Networks (CNN)
To improve on the lack of generalization and increase performance prediction accuracy, convolutional neural networks (CNN) are used to design anisotropic digital coding metasurfaces. CNNs are a class of ANNs that use convolution functions to learn hierarchical patterns within data. These models learn generalized patterns across many spatial scales from their input data and are widely used on image data. In
[73], a CNN predicted the reflection phase response of binary coded metaatoms where each metaatom contains by square subpixels and is mirrored with twofold symmetry. The CNN used in this study is a layer deep residual network, known as Resnet101. The authors found that other networks with fewer layers resulted in less precise and robust performance predictions.The results show an accuracy of of phase responses with error in the phase. A drawback of this binary coding approach is that a by pixel metaatom has potential design combinations. This study generated training data by simulating randomized pixel matrices. However, it was fundamentally inefficient in an analogous manner to GA because the training data is essentially random and does not contain the knowledge of canonical structures in the training data set. This likely results in significantly more required training data and greater network complexity. Another drawback of this study is that it required fullwave simulation of 70,000 training examples 10,000 test examples to generate the training dataset.
A significant CNN advantage is that the metaatom shape is directly input into the network rather than shapespecific design parameters. The convolutional filters allow the CNN to learn the physical structure that leads to given EM response, leading to a broader applicability of the model.
In [57], the element phases of a reconfigurable MTS were computed by a 11layer CNN for multiple beam steering applications. The input was the parameter vector representing the target beam pattern and the output was a matrix that carried the 1bit codes for a programmable element MTS. This technique to obtain the phase matrices reduced the time for producing almost similar beam patterns using conventional methods to a few milliseconds.
14.3.3 Deep Generative Models (DGMs)
Generative models are unsupervised or semisupervised learning models that infer a function to describe hidden structure from unlabeled data. Their functions include clustering, density estimation, feature learning, and dimension reduction. Whereas discriminative networks capture the relationship between metaatom geometry and spectral response from a training set, DGMs focus on learning the properties of metaatom geometry distributions
[41, 25, 33, 31]. Major classes of DGMs (Fig. 14.4) applied to MTS inverse design are as follows.14.3.3.1 Generative adversarial networks (GANs)
In a GAN system, two ANNs compete to improve each of their models: the generative network learns to create inputs indistinguishable from the training data while the discriminative network learns to identify true data from the output of the generative network. Training GANs involves jointly training a generator network and a discriminator network in a game theoretic approach to find a local Nash equilibrium. The goal of a generative model is to observe a collection of training examples and learn the underlying probability distribution that generates them. GANs are able to generate new samples from the estimated probability distribution. GANs were initially applied to generate photos, however, have been applied to many domains including speech and video generation. Very recently, GANs have been applied to generate new MTS hardware design including those not explicitly seen in the training dataset or current literature. At the end of a successful training process, GANs are able to produce realistic metaatom designs, even for very complicated datasets and spectral responses.
In, [25], we introduce GANs to microwave MTS design. GANs are promising for lowcost MTS design with complex frequency and polarization dependent scattering responses. In [41], an input set of userdefined EM spectra is fed to GAN that generates candidate patterns to match the ondemand spectra with high fidelity. Here, DNNs are employed to approximate the spectra of the MTS and perform inverse design by generating metaatom structures that yield userdefined input spectra. Once the model is trained, extensive parameter scans and trialanderror procedures are bypassed. This conditional deep convolutional GAN (cDCGAN) architecture uses three interconnected CNNs: generator, discriminator, and simulator. The simulator is a pretrained network that serves as a surrogate model for fast spectral performance prediction. In this study, is a fivelayer CNN with threefully connected layers at the output. The conditional generator networks accepts the desired spectral response and a latent noise vector to output potential metaatom designs. The discriminator serves to train the generator by evaluating the distance of the distributions between the geometric patterns from training data and generator. At the end of successful training, discriminator is unable to distinguish batches from generator and training set. This approach is shown to exhibit high accuracy in inverse design of metaatoms.
In [25], a deep convolutional GAN (DCGAN) is employed to generate anisotropic RF metaatom designs. Using a small set of simulated spectra, the network learned the relationship between the physical structure of metaatoms and their reflection spectra for vertical and horizontal polarizations. The DCGANs generated metaatom structures that resembled design features in the training data. To speed up training, the network was fed with parametric variations of twelve published metaatom designs to a fullwave EM simulator. Starting out with parametric variations of canonical metaatoms scatterers, the network picked up more efficiently than it would have from training with responses of randomized pixel data.
14.3.3.2 Conditional variational autoencoder (cVAE)
As an alternative to GAN approaches, [44] presents a probabilistic DGM that solves both forward and inverse problems at the same time. It is trained in an end‐to‐end manner and uses a deep convolution cVAE (cDCVAE) architecture (Fig. 14.4
) comprising an encoderdecoder network structure. The encoder maps the metaatom structure to a multivariate Gaussian distribution in the latent space and the conditional decoder network inputs the reflection spectra and latent variable to generate metaatom designs (Fig.
14.4).In [44], the RIS inverse design is modeled in a probabilistic generative manner to investigate the complex structure–performance relationship in an interpretable way and solve the onetomany mapping issue that is intractable in deterministic models. It developed a semisupervised learning strategy that allows the model to utilize unlabeled data in addition to labeled data in an endtoend training. The RIS design and spectral response are encoded into a lowdimensional latent space with a predefined prior distribution, from which the latent variables are sampled. The DGM, comprising prediction, recognition, and generation models, serves as a tool to accelerate the design, characterization, and even new discovery of MTS.
14.3.3.3 Global topology optimization networks (GLOnets)
Recently, GANs utilized to learn structural features of topologyoptimized (TO) metagratings for inverse design [33, 31]. TO is a method of optimizing a material layout or an array of pixels to maximize system performance given a set of constraints and boundary conditions. Unlike other approaches, simulation overload for TO does not increase with the number of RIS units. In [33], freeform diffractive metagratings were designed using TOGAN. Here, DGMs were trained from images of periodic, TO metagratings to produce efficient scattering structures with the desired performance over a broad range of frequencies and angles. The network employed training examples for each angle. However, the performance of the best structures was not robust and additional refinement was needed to meet the desired performance. In [31]
, dielectric metasurfaces optimization was performed using a physicsinformed cGAN. Global optimizationbased generative networks (GLOnets) are able to search the design space for the global optimum design. Unlike other GAN approaches, GLOnets seek to fit a narrowpeaked function centered on the optimal solution without a training set. The GLOnet generates a distribution of metaatoms to samples the global design space and then shifts the distribution toward a more optimal design. Training requires computing forward and adjoint EM simulations of output structures using backpropagation. In this work, GLOnets are shown to be successful and computationally efficient global TO for MTS and metagratings.
14.4 Case Studies
We perform two case studies for the design of single and multilayer RIS based on [25] and [22], respectively. The design approach in [22] introduced the cDCGANbased for jointly designing several layers of tensorial RIS. It represented three RIS layers with a
redgreenblue (RGB) image matrix. The advantages of the cDCGAN are that it trains classifiers in a semisupervised manner and generates new freeform shapes not previously shown in the literature. However, GANs can be unstable and challenging to train. We validated the designs by simulating their spectrum using a fullwave EM solver and comparing the results to the desired spectrum. In this data representation, the top layer metaatom design is represented as the first channel, the second layer metaatom design is represented by the second channel, and a third layer is represented by the third channel using the conventional RGB image format.
14.4.1 MTS Characterization Model
Consider a twodimensional (2D) MTS lying in the xy plane with zaxis being the direction of propagation. According to Huygens’ principle, the EM fields created by arbitrary sources in an arbitrary volume are found as the fields created by equivalent surface currents on the volume surface [65]. Therefore, a known incident EM source, such as a plane wave, can be transformed into a desired transmitted or reflected wave using an MTS. The MTS creates the desired aperture field distribution or phase shift by modifying the effective boundary conditions of the EM surface.
The amplitude and phases of transmitted and reflected waves from MTS are functions of surfaceaveraged induced electric and magnetic current densities, and , respectively. These effective surface current densities induced on MTS are described by average tangential electric () and magnetic () fields on each side of MTS as
(14.1)  
(14.2) 
where is the unit vector normal to the MTS. Examples of passive implementations of Huygens’ MTS (HMS) include reflectionless refraction, perfect anomalous reflection, and arbitrary antenna beamforming [5].
The induced surface currents and are related to their respective average tangential fields (applied on a thin slab of polarizable particles) by spatiallyvarying electric surface impedance and magnetic surface admittance ,
(14.3)  
(14.4) 
Substituting (14.1) and (14.2) into (14.3) and (14.4), respectively, yield the following generalized sheet transition conditions (GTSCs) [28, 12] used for describing an MTS:
(14.5)  
(14.6) 
In case of single polarization, the tensor quantities above reduce to scalars
[12]. Specifying the desired incident fields and and desired output fields and leads to computation of the required electric impedance and magnetic admittance at each (spatial) location of the MTS. The bianisotropy is included in these boundary conditions by introducing the tensor magnetoelectric coupling coefficient as [5](14.7)  
(14.8) 
The transmission and reflection spectral responses of MTS, described by vectors and , respectively, are functions of the surface impedances at a particular incidence angle and frequency (GHz). For instance, [5, 11] where is a nonlinear function. In this chapter, we fix (broadside incidence) so that , where we have omitted the arguments for simplicity. From here on, we focus on only because the design procedure using is identical.
The EM wave is also characterized by its polarization. We consider two polarizations  ‘’ and ‘’  wherein the electric field is parallel to the x and ydirections, respectively. For an incident wave with a particular polarization, the MTS produces responses in both polarizations. For example, the response in () polarization when the incident wave is also polarized (polarized) is the copolar response (). Similarly, crosspolar responses and are defined. A multilayer metaatom consists of multiple layers of different shapes separated by dielectric spacers for structural support. Consider a 3layer MTS (Fig. 14.5) whose composite response is the superposition of the responses of individual layers. Our goal is to train the MDGAN to implicitly learn physical quantities , , and by mapping various design geometries to transmission spectra and produce new metaatom designs for each layer to realize composite responses , , , and .
14.4.2 Training and Design
We evaluated proposed inverse design approach by implementing our distributed cMDGAN architecture using PyTorch and performing simulations on an NVIDIA Tesla T4 GPU. During training, we included parametric variations of only those metaatom shapes that have been extensively studied in the literature. Fig.
14.6 lists these shapes and enumerates variations in the physical parameters to generate training data. The CNNs process matricized data e.g. a color image composed of three matrices, each of which contains pixel intensities in red, green, and blue (RGB) color channels. Rather than feeding a threechannel image matrix representing physical RGB colors as is conventionally done in image recognition, we exploit the three channel matrix input into the CNN to represent spatial design of metaatom scatterers in different layers of a multilayer MTS. Prior works do not employ this innovative technique of representing multiple MTS layers as channels of an image matrix.In the first case study, we generated singlelayer metaatom designs using cDCGAN. The copolarization and crosspolarization transmission responses of the resulting metaatom designs (Fig. 14.7
) differed from EM simulators by less than a dB. One of the most exciting features of cDCGAN is its ability to discover new geometries not previously found in the literature. This suggests that the model implicitly learned the physical relationships of Maxwell’s equations rather than simply interpolating from past designs. We perform a second case study for multilayer metaatom design. Building on this techniques, the federated learning approach in
[23] employed a conditional multidiscriminator distributed GAN (cMDGAN) (see Fig. 14.4) for multilayer RF MTS discovery (Fig. 14.8). The results show the feasibility of GANbased approaches for metaatom discovery.14.5 Applications
Lately, the RISaided wireless systems have exploited DL to handle very challenging problems. For instance, signal detection in RIS requires development of endtoend learning systems under the effect of channel and beamformers [34]. The channel needs to be estimated for multiple communication links, i.e., BSuser and BSRISuser [9]. Finally, beamformers are designed (by solving complex optimization problems) for phase shifters at both BS and passive elements of the RIS [63]. The DLbased techniques are able to handle the multidimensional, huge datasets in all these problems and may also be employed for channel modeling [19], where the conventional modelbased approaches are not very useful. There have been recent surveys on applying DL [6] and RIS [19] individually to wireless communications. Here, we provide an overview of systems which jointly employ both approaches. In particular, we describe DL techniques (Table 14.2
) for three important RIS problems: signal detection, channel estimation, and beamforming. Each of these requires different DL architectures, which have so far included supervised learning (SL), unsupervised learning (UL), reinforcement learning (RL) and federated learning (FL). The UL and RL do not require labeling; SL needs labeled dataset; and FL has distributed structure for model training. We provide a detailed synopsis of the advantages and shortcomings of each algorithm for these three applications in the subsequent sections.
In RISassisted scenario, wherein the BS with antennas transmits data symbols by using a baseband precoder . Hence, the downlink transmitted signal becomes . The transmitted signal is received from the user with two components, one of which is through the direct path from the BS and the another one is through the RIS. The received signal from the th user can be given by
(14.9) 
where and denotes the direct channel between the BS and the th user. The vector expresses the RISassisted channel between the RIS and the th user. is a diagonal matrix, i.e., . Here, represents the on/off state of the RIS elements. In practice, the RIS elements cannot be perfectly turned on/off, Hence, can be modeled as for . is the phase shift of the reflective elements. Finally, the channel between the RIS and the BS is represented by .
In mmWave transmission, the channel can be represented by the SalehValenzuela (SV) model where a geometric channel model is adopted with limited scattering. Hence, we assume that the mmWave channels, i.e., and , include the contributions of , and paths, respectively. Thus, we can represent the channels and as and where and are the complex channel gains and received path angles for the corresponding channels, respectively. and are and steering vectors of the path angles as , where and is the array spacing for the wavelength . Further, the mmWave channel between the BS and the RIS is given by
(14.10) 
where denotes the complex gain and are the angleofdeparture (AOD) and angleofarrival (AOA) angles of the paths, respectively. and are the steering vectors. Let be the cascaded channel matrix between the BS and the th user as where . Then, we can write , for which we have .
14.5.1 DLBased Signal Detection in RIS
The signal detection comprises mapping the received symbols under the effect of channel and beamformers to transmit symbols (Fig. 14.9). The signal detection problem can be formulated as
(14.11) 
which requires the knowledge of the channel, i.e, and . Instead, DLbased model accepts the input data , where is the number of collected observations. Then, the DL model is trained to construct a nonlinear mapping between the corrupted data and the clean symbols .
To leverage DL for signal detection, [34]
devised a multilayer perceptron (MLP) for mapping the channel and reflecting beamformer effected data symbols to the transmit symbols. The MLP is a feedforward neural network (NN) composed of multiple hidden layers. The framework in
[34] uses three fully connected layers. Once the MLP is trained on a dataset composed of receivedtransmitted data symbols, each user feeds the learning model with the block of received symbols. These blocks account for the effect of channel and beamformers. Then, MLP yields the estimated transmit symbols.A major advantage of this approach is its simplicity that the learning model estimates the data symbols directly, without a prior stage for channel estimation. Thus, this method is helpful reducing the cost of channel acquisition. In [34], a biterrorrate (BER) analysis has shown that the DLbased RIS signal detection (DeepRIS) provides better BER than the minimum meansquarederror (MMSE) and close performance to the maximum likelihood estimator.
However, a few challenges remain to achieve a reliable performance. The training data should be collected under several channel conditions and different beamformer configurations so that the trained model learns the environment well and reflects the accurate performance in different scenarios. This is a particularly challenging task because it requires collection of the training data for different user locations. As a result, DLbased signal detection demands huge training dataset collected at different channel conditions.
14.5.2 DLBased RIS Channel Estimation
The RIS is composed of a huge number of reflecting elements and, therefore, channel state acquisition is a major task in RISassisted wireless systems. A common approach is to turn on and off each individual RIS element onebyone while also using orthogonal pilot signals to estimate the channel between the BS and the users through RIS. In particular, RIS channel estimation via DL involves constructing a mapping between the received input signals at the user and the channel information of direct and cascaded links (Fig. 14.9). In this way, DLbased techniques reduce the pilot percentage and complexity in channel estimation stage [7].
The SL approach proposed in [9] estimates both direct and cascaded channels via twin convolutional neural networks (CNNs). First, the received pilot signals at the user are collected by sequentially turning on the individual RIS elements. Then, the collected data are used to find the least squares estimate of the cascaded and the direct channels. Both CNNs are trained to map the least squares (LS) channel estimates to the true channel data. The upshot is that each user estimates its own channels only once and feeds the received pilot data (LS estimate) to the trained CNN models. The CNNs have higher tolerance than MLP against the channel data uncertainties, imperfections (such as switching mismatch) of RIS elements.
When the model training is conducted at the user with huge datasets as in [9], the system may lack sufficient computational capability. This is overcome by FLbased training [7], where the learning model updates are computed at the devices (nodes) and aggregated at the BS (central server) (Fig. 3), thereby eliminating the transmission of raw data. FL significantly reduces the transmission overhead since the size of the datasets is usually larger than the size of the learning model, and its performance improves as the number of users increases [7, 42]. Furthermore, instead of using two CNNs as in [9], a single CNN in [7] jointly estimates both cascaded and direct channels.
Although FL reduces the transmission overhead during model training, its training performance is upper bounded by the centralized model training, i.e., training the model with the whole dataset at once. Therefore, the prediction performance of FL is usually poorer than the centralized learning (CL). As shown in Fig. 14.10), CL and FL frameworks are compared with the MMSE and the LS estimation. We note that FL performs slightly poorer than CL in high SNR regimes. Despite this, FL significantly reduces the transmission overhead, e.g., approximately tenfold reduction in the number transmitted symbols [7]
. The performance of FL improves with the increase in the number of users or edge devices because this reduces the variance of the model updates aggregated at the BS. The diversity of the local dataset of the users also affects the training/prediction performance and better performance is obtained if the local datasets are close to uniformity.
Both SL and FLbased channel estimation techniques suffer from high channel training overhead. In this context, compressive channel estimation with deep denoising neural networks (DDNNs) is very effective [40]. It employs a hybrid passive/active RIS architecture, where the active RIS elements are used for uplink pilot training and passive ones for reflecting the signal from the BS to the users. Once the BS collects the compressed received pilot measurements, complete channel matrix is recovered through sparse reconstruction algorithms such as orthogonal matching pursuit (OMP). Then, DDNN is used to improve the channel estimation accuracy by exploiting the correlation between the real and imaginary parts of the mmWave channel in angulardelay domain. During training, the input is the OMPreconstructed channel matrix and the output is the noise, i.e., the difference between the OMP estimate and the ground truth channel data. This method leverages both CS and DL yielding a performance better than using these techniques individually. The major drawback is the additional hardware complexity introduced by the active RIS elements. Furthermore, OMP algorithm is used in place of the raw received pilot measurements for constructing the input. This requires repeated execution of the OMP algorithm thereby increasing the prediction complexity over the DL methods in [9] and [7].
Consider the downlink scenario where the BS transmits the orthogonal pilot signals , one at a single coherence time , with and . Hence, the total number of channel uses to estimate the direct channel is . The received signal at the th user can be given by
(14.12) 
where is the pilot signal matrix while and are row vectors and . We assume that the pilot training has two phases: direct channel estimation (i.e., ) and the cascaded channel estimation (i.e., ). In phase I, we assume that all of the RIS elements are turned off, i.e., , by using the BS backhaul link. We note here that by setting as does not affect the the direct and cascaded channels since they do not depend on the reflect beamformer as seen in (14.12). Then, the received baseband signal at the th user becomes
(14.13) 
Here, the direct channel is selected as the label of the deep network with the corresponding input data of .
Once , being the estimated channel, is obtained, in the second phase of the training stage, the cascaded channel can be estimated. This can be achieved via two approaches. In the first approach, pilot signals are transmitted when each of the RIS elements is turned on one by one. In this case, the BS sends a request to RIS via the microcontroller device in the backhaul link to turn on a single RIS element at a time. For the th frame, the reflect beamforming vector becomes where and the received signal from the cascaded channel at the th user becomes
(14.14) 
where and are row vectors. In (14.14), represents the th column of as . Then the leastsquares (LS) estimate of becomes
(14.15) 
By using , (14.15) can be solved for . Then, we can construct the estimated cascaded matrix as .
Then, the deep network accepts the received signals as input at the preamble stage. As a result, the inputoutput pairs become and for direct and cascaded channel estimation, respectively.
Now, let us consider model training via CL for channel estimation, wherein the training is performed by collecting the local datasets from the users. Once the BS has collected the whole dataset , the training is performed by solving the following problem
(14.16) 
where is the number of training samples and
denotes the loss function defined as
(14.17) 
which is the MSE between the label data and the prediction of the CNN, .
On the other hand, in FL, the local datasets are preserved at the users and not transmitted to the BS. Hence, FLbased model training is performed at the user side as
(14.18) 
where . Notice that the FLbased model training in (14.5.2) is solved at the user while the CL problem in (14.5.2) is handled at the BS. To efficiently solve (14.5.2) and (14.5.2), gradient descent (GD) is employed and the problems are solved iteratively. In CL, the gradient is computed over the whole dataset as and the parameter update is performed as
(14.19) 
where is the learning rate.
In FL, each user computes the gradients individually as to solve (14.5.2), then sends them to the BS, where the model parameters are updated as
(14.20) 
Once the model is trained, each user can feed its received pilots signals to the CNN to predict its channel data.
Learning Scheme  NN Architecture  Benefits  Drawbacks 

Signal detection  
SL [34]  MLP with layers  No need for channel estimation algorithm  Still needs to design beamformers and requires huge datasets and deeper NN architectures 
Channel estimation  
SL [9]  Twin CNNs with convolutional, fully connected layers  Each user estimates its own channel with the trained model  Data collection requires channel training by turning on/off each RIS elements 
FL [7]  A single CNN with convolutional, fully connected layers  Less transmission overhead for training, A single CNN estimates both cascaded and direct channels  Performance depends on the number of users and the diversity of the local datasets 
SL[40]  DDNN with convolutional layers  Leverages both compressed sensing (CS) and DL methods  Requires active RIS elements. High prediction complexity arising from CS algorithms 
Beamforming  
SL [63]  MLP with layers  Reduced pilot training overhead  Requires active RIS elements for channel training 
UL [17]  MLP with layers  Reduced complexity at the model training stage  Implicitly needs the reflect beamformers as labels 
RL [62]  DQN with layers  Provides standalone operation since RL does not require labels like SL  Longer training. Active RIS elements needed for channel acquisition 
RL [15]  DDPG with layered actor and critic networks  Better performance than DQN  Large number of NN parameters are involved 
RL [20]  DDPG with actor and critic networks  Accelerated learning performance with the aid of optimization, shrinking the search space  Additional optimization tools needed 
FL [42]  MLP with layers  Less transmission overhead involved during model training  RIS must be connected to the PS 
Secure beamforming  
RL [69]  DQN with layers  Robust against eavesdropping  High model training complexity 
Energyefficient beamforming  
RL [38]  DQN  Energyefficient and robust against channel uncertainty  RIS beamforming only 
Indoor beamforming  
SL [29]  MLP with layers  Reduces hardware complexity of multiple BSs and improves RSS for indoor environments  Learning model performance relies on room conditions 
14.6 DLAided Beamforming for RIS Applications
Beamforming in RISbased communications has diverse applications such as RISonly beamforming (passive), BSRIS beamforming (active/passive), secure beamforming (eavesdroppers included), energyefficient beamforming, and indoor RIS beamforming. There are specific DL challenges and solutions to each one of these problems.
In general, the beamforming design problem in RISassisted scenario maximizes the spectral efficiency of the system as
(14.21) 
where is the total transmit power and denotes the set of discrete phaseshifts. Also, we define the signaltointerferenceplusnoise ratio (SINR) as , wherein only RISreflected channel is assumed.
14.6.1 Beamforming at the RIS
The RIS beamforming requires passive elements continuously to reliably reflect the BS signal to the users. Here, the MLP architecture [63] is helpful in designing the reflect beamforming weights using active RIS elements [40]. These elements are randomly distributed through the RIS. They are used for pilot training, after which compressed channel estimation is carried out using OMP. During data collection, the reflect beamforming weights are optimized by using the estimated channel data. Finally, a training dataset is constructed with channel data and reflect beamformers as the inputoutput pairs for an SL framework. Note that the active RIS elements present similar shortcomings as in [40]. However, the method in [63] excels by leveraging DL for designing beamformers.
The labeling process in [63] demands solving an optimization problem for each channel instance in training data generation stage. One possible way to mitigate this is to use labelfree techniques, such as UL. The UL approach in [17] for reflect beamforming design employs MLP with five fully connected layers. The network maps the vectorized cascaded and direct channel data input to the output comprising the phase values of the reflect beamformers. The loss function is selected as the negative of the norm of the channel vector, which may seem like an unsupervised approach because it does not minimize the error between the label and learning model prediction. However, this technique yields the phase information at the output uniquely for each training samples. Consequently, the beamformers implicitly behave like a label in the training process. In UL, the training data is clustered into smaller sets without a prior knowledge about the “meaning” of each clustered sets. However, in [17], the output of the NN is a design parameter, i.e. reflect beamformer phases, which have the complexity of beamformer optimization for each input.
In order to eliminate the expensive labeling process of the SLbased techniques, [62] employed RL to design the reflect beamformers for singleantenna users and BS. The RL is a promising approach which directly yields the output by optimizing the objective function of the learning model. First, the channel state is estimated by using two orthogonal pilot signals. An action vector is selected either by exploitation (using prior experience of the learning model) or exploration (using a predefined codebook). After computing the achievable rate based on the selected action vector from the environment, a reward or penalty is imposed by comparing with the achievable rate with a threshold. Upon reward calculation, a Deep Quality Network (DQN) (Fig. 14.12) updates the map from the input state (channel data) to the output action (action vector composed of reflect beamformer weights). This process is repeated for several input states until the learning model converges. While RL is not an RISspecific technique, it is particularly useful in lowering the overhead of labeling process as compared to SL architectures deployed by RNN or CNN models, which require labeled datasets. The RL algorithm learns reflect beamformer weights based on the optimization of the achievable rate. Thus, RL presents a solution for online learning schemes, where the model effectively adapts to the changes in the propagation environment. However, RL techniques have longer training times than the SL approaches because reward mechanism and discrete action spaces make it difficult to reach the global optimum. The labelfree process implies that the RL usually has slightly poorer performance than the SL.
To accelerate the training stage by the use of continuous action spaces, a deep deterministic policy gradient (DDPG) (Fig. 14.12) was introduced in [15]. Here, actorcritic network architectures are used to compute actions and target values, respectively. First, the learning stage is initialized by the use of input state excited by cascaded and direct channels. Given the state information, a deep policy network (DPN) (actor) constructs the actions (reflection beamformer phases). Here, the DPN provides a continuous action space that converges faster than the DQN architecture in [62]. The action vector is used by the critic network architecture to estimate the received signaltonoiseratio (SNR) as objective. This SNR then yields the target beamformer vector under the learning policy. Using the gradient of DPN, the network parameters are updated and the next state is constructed as the combination of the received SNR and the reflecting beamformers. This process is repeated until it converges. An additional benefit of this approach is that it outperforms fixedpoint iteration (FPI) algorithms used to solve reflect beamforming optimization. Moreover, the continuous action space representation with DPN in DDPG provides robustness of the learning model against changes in channel data. However, multiple NN architectures (actor and critic networks) increase the number of learning parameters and aggravate model update requirements for each architecture.
The model initialization in both DQN and DDPG may force the learning models to start far from the optimum point during the early stages of learning. This leads to a slow convergence and poor reward performance. In order to accelerate the learning process, [20]
devised a joint learning and optimization technique. The key idea is to use DDPG to search for optimal action for each decision epoch during training. Then, a feasible beamformer vector is found via optimization in a convexapproximation setting. This reduces the search space of the DDPG algorithm and shortens training times.
Even if RL is a labelfree approach that reduces the overhead during training data generation, training approaches in [15, 62, 20] demand expensive transmission overhead to be trained on huge datasets. This is mitigated in FL techniques. The FL approach in [42] learns the RIS reflect beamformers by training an MLP by computing the model updates at each user with the local dataset. The model updates are aggregated in a parameter server (PS), which is connected to the RIS. The MLP input is the cascaded channel information and the output labels are RIS beamformer weights. The federated architecture lowers the transmission overhead during training. However, it is assumed that the PS is connected to the RIS. The simple architecture of the RIS could make this infeasible. It is more practical to access the PS via BS for model training.
14.6.2 SecureBeamforming
Physical layer security in wireless systems is largely achieved through signal processing techniques, such as cooperative relaying and cooperative jamming. The hardware complexity is a major issue in these methods. The lowcost, less complex RISbased systems have the potential to mitigate these problems. The RLbased secure beamforming [69] minimizes the secrecy rate by jointly designing the beamformers at the RIS and BS to serve multiple legitimate users in the presence of eavesdroppers. The RL algorithm accepts the states as the channel information of all users, secrecy rate and transmission rate. Similar to [15], the action vector are beamformers at the BS and RIS. The reward function is designed based on the secrecy rate of users. A DQN is trained to learn the beamformers by minimizing the secrecy rate while guaranteeing the qualityofservice requirements. The model training takes place at the BS, which is responsible for collecting the environment information (channel data) and making decisions for secure beamforming. This scheme is more realistic and reliable than that of [62, 15], which ignore the effect of eavesdroppers. The learning model includes highdimensional state and action information, such as the channels of all users and beamformers of BS and RIS. This may necessitate more computing resources for training than nonsecure RIS [62, 15] and conventional SL techniques [9, 63].
14.6.3 EnergyEfficient Beamforming
The RIS configuration dynamically changes depending on the network status. It is very demanding for the BS to optimize the transmit power every time when the on/off status of RIS elements is updated. This could be addressed by accounting energyefficiency in the beamformer design problem. In [38], a selfpowered RIS scenario maximizes the energyefficiency by optimizing the transmit power and the RIS beamformer phases. In this DQNbased RL approach, the BS learns the outcome of the system performance while updating the model parameters. Thus, the BS makes decisions to allocate the radio resources by relying on only the estimated channel information. The RL framework has states selected as the estimated channels from users and the energy level of the RIS. Meanwhile, the action vector includes the transmit power, the RIS beamformer phases and on/off status of the RIS elements. The learning policy is based on the reward which is selected as the energyefficiency of the overall system. However, this work considers only RIS beamforming and ignores the same at the BS.
14.6.4 Beamforming for Indoor RIS
Different from the above scenarios, [29] addresses the RIS beamformer design problem in an indoor communications scenario to increase the received signal strength (RSS) (see Fig. 14.2). This is particularly useful from the perspective of low hardware complexity because it eliminates deployment of multiple BSs to improve RSS. The MLP architecture in [29] accepts twodimensional user position vector and yields the RIS beamformer phases at the output. Since the channel data is not employed as input, the network does not have to deal with severe environmental fluctuations. However, the learning model trains on specific room environments and may perform poorly for different room conditions or different obstacle distribution in the same room. This is mitigated in RLbased solutions which are highly adaptive to different environments [62, 15].
14.7 Challenges and Future Outlook
The techniques for RIS inverse design and processing are constantly evolving. Major challenges include reduction of training cost, gathering of labeled data, effective handling of system imperfections, and better data representations.
14.7.1 Design
New approaches are needed to increase the computational efficiency and reduce the amount of training required for DLbased RIS design. As mentioned below, reduction of design time and achieving full EMcompliance remains a major challenge.
14.7.1.1 Hybrid physicsbased models
Hybrid models, where training set is supplemented by physicsbased analytical models, reduce the amount of required training data and increase learning efficiency. Analytical RF circuitbased models are available to predict the performance of several canonical metaatom designs. To speed up the training data generation, these analytical circuitbased models could be used to supplement the training data set and reduce iterations of timeconsuming fullwave EM simulations. It may also be feasible to create innovative DL design and optimization architectures that utilize physicsbased analytical models within the ANN architecture. Another method to reduce the amount of required training data for multilayer MTS designs is to use Tmatrix data to analytically cascaded MTS designs from singlelayer training data.
14.7.1.2 Other learning techniques
TL may also be used for expediting and improving the learning of a new task by using a previously trained neural network weights and bias as the initialization for the new ANN. Since all ANNs for metaatom performance prediction and inverse design are implicitly learning Maxwell’s equations, it is sensible that a network trained for one metaatom design or frequency band is scaled and transferred to a related design. DQNs have also been studied to increase the efficiency of MTS holograms and automated multilayer RIS design.
14.7.1.3 Improved data representation
More complex input data structures and representations are increasingly studied for DLbased RIS. While this article focused on discrete input parameters and image data structures are RIS design representations, graphical and sequential data structures have recently been proposed as alternatives. The graphical model has been used to represent EM systems with nearfield coupling (as in coupled resonators). In this arrangement, graph nodes contain resonator attributes, such as material, geometry, and location, and graph nodes represent the nearfield coupling factors. These graphical data structures are processed using graphical neural networks (GNNs). While yet to be extensively explored in the domain of MTS, GNNs have been applied to model a broad range of physical systems. GNNs have the potential to handle additional complexities to jointly optimize RIS design and operation in wireless communication networks. Additionally, sequential data structures are another data representation that is yet to be extensively explored in the context of MTS. Similarly, sequential data structures are useful for representing timesequence data in dynamic EM systems (as in RIS filters) and are learned using recurrent neural networks (RNNs). In other domains, such as natural language processing (NLP), sequential data is often learned using RNNs, which are ANNs that use forward or backward connections to enable a memory of internal states between successive passes to the network. As dynamic operation of RIS becomes increasingly important in the development of wireless networks and SRE, it is likely that RNNs will become increasingly useful to model dynamic RIS.
subsectionDeep Reinforcement Learning (DRL) Similar to evolutionary optimization techniques, RL is an area of ML concerned with how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward through trial and error. Without the use of labeled training data, RL algorithms learn system dynamics through exploration to maximize a reward function. Here, DRL algorithms, such as DQNs, have produced ML advances in a broad range of applications including robotics, strategy games, NLP, and computer vision. To date, DQNs have been studied to increase the efficiency of MTS holograms and automated multilayer MTS design. However, research using DRLs for MTS design and optimization is very limited and further research is needed to develop these techniques for MTS applications. DRLbased networks hold the promise for automated selflearning of RIS in SRE that are able to adapt and optimize themselves for dynamic RF environments and modulations.
14.7.2 Applications
Several challenges remain for DL architectures to reach their full potential in realizing significant performance gains and efficiency for RISassisted wireless systems. Given that it is an emerging technology, larger sets of real data are not yet available. Then, model training consumes much time and resources, including parallel processing and storage. Further, to achieve commercial viability of DLbased RISaided communications, dynamically adapting to changes in the environment is crucial. Finally, new RISspecific implementation challenges have also been identified within emerging technologies such as terahertz communications, cellfree massive MIMO, drone operations, and open radio access network (RAN).
14.7.3 Channel Modeling
Channel Modeling is a challenging task in communication systems, especially with the large number of antennas due to the complexity of system architecture. In order to provide a reliable channel modeling performance, DL can be of help to construct a datadriven model based on the field measurements. In this case, SL schemes can be used to construct the channel model as a relationship between the input and output of a learning model [9]. Thus, DLbased methods are expected to become more frequently used in RISassisted wireless networks for channel modeling.
14.7.3.1 Data Collection
Massive data collection hampers successful performance of DLbased techniques for all wireless communications tasks: signal detection, channel estimation, and beamformer design. The signal detection requires collection and storage of transmit and receive data symbols for different channel conditions. The prerequisites for channel estimation and beamforming are even more tedious because of additional labeling process. This is difficult to overcome in, especially online scenarios. Apart from SL, the labelfree structure in RL is particularly helpful but at the cost of training times. It is possible to relax the data collection requirements by realizing the propagation environment in a numerical electromagnetic (EM) simulation tool [7] and then using a more realistically simulated data. This is helpful in constructing the training dataset offline but chances of failure remain in a real world scenario. Very recently, public datasets for channel estimation problem in RISaided communications were made available in the 2021 IEEE SIgnal Processing Cup competition.
14.7.3.2 Model Training
The models are usually trained offline prior to their online deployment at a PS connected to the BS. In addition, the model training complexity increases with the number of RIS elements and number of RISs deployed between the users and the BS. This introduces huge transmission overhead for model training. The FL has potential to reduce this cost and enable a communicationefficient model training (see, e.g., Fig. 14.10). Here, combining the labelfree structure of RL and the communications efficiency of FL, i.e. federated reinforcement learning, could be the next step.
14.7.3.3 Environment Adaptation and Robustness
The behavior of the channel affects all DLbased tasks including channel estimation, beamforming, user scheduling, power allocation, and antenna selection/switching. Addressing the tradeoff between the bias and the variance of the model output is essential for robust performance. This is usually achieved using a validation data so that the learning model does not either overfit or underfit the training data. Nonetheless, this does not generalize the learning performance to different environments. Moreover, the current DL architectures for wireless systems remain environmentspecific because the input data space of their learning model is limited. As a result, the performance degrades significantly when the learning model is fed with the input from unlearned/uncovered data space. In order to cover larger data spaces and provide a robust performance against the changes in the environment, wider and deeper learning models are required. But the current DL architectures for wireless communications comprise less than a million neurons and and are composed of only a few layers (Table 1) [7]
. The giant learning models for image recognition or natural language processing consists of millions and billions of neurons, e.g., VGG (138 million), AlexNet (60 million), and GPT3 (170 billion). Clearly, going wider and deeper in designing the learning models is of great interest for future DLbased RISaided systems.
14.8 Summary
We surveyed DLbased techniques for designing RIS hardware to be deployed for future wireless communications. When the design space and scale of the RIS arrays increases, learningbased architectures outperform evolutionary optimization techniques for both surrogate performance modeling and inverse design. The DL inverse design is flexible in admitting a variety of RIS unit structures. The DGMs are the most useful because of their ability to generate new designs not previously seen in the published literature. While active research and techniques in this area are still evolving, DL is a promising solution for the inverse design of RIS.
We also investigated DL architectures for RISassisted wireless systems for key applications of signal detection, channel estimation, and beamforming. We extensively discussed various learning schemes and model architectures, such as SL, UL, FL and RL for RIS applications. The SL exhibits better performance than UL and RL because of label usage. The UL and RL are labelfree schemes that provide less complexity during training data generation. However, UL still involves an optimization stage for each data instance. Among all, the RL is the most promising technique because of its standalone operation and the consequent ability to adapt to environmental changes at the cost of longer training times.
The FL reduces the transmission overhead significantly and can be integrated with the other learning methods. The combination of FL and RLbased learning policies not only exhibits a communicationefficient model training but also provides environmental adaptation. Major research challenges include data collection, model training, and environment adaptation. These should be addressed simultaneously to provide a reliable DL architecture for the nextgeneration RISassisted wireless systems. Specifically, the combination of FL and RL should be fed with the collection of huge datasets and massive neural networks so that a robust DL architecture is achieved.
Acknowledgement
The authors warmly acknowledge valuable contributions of Dr. John A. Hodge (Amazon) for the inverse design portion of this chapter, when he was a graduate student at Virginia Tech.
Bibliography
 Bengio et al. [2013] Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013.
 Campbell et al. [2019] Sawyer D Campbell, David Sell, Ronald P Jenkins, Eric B Whiting, Jonathan A Fan, and Douglas H Werner. Review of numerical optimization techniques for metadevice design. Optical Materials Express, 9(4):1842–1863, 2019.
 Chen [2012] HouTong Chen. Interference theory of metamaterial perfect absorbers. Optics Express, 20(7):7165–7172, 2012.
 Chen et al. [2016] HouTong Chen, Antoinette J Taylor, and Nanfang Yu. A review of metasurfaces: Physics and applications. Reports on Progress in Physics, 79(7):076401, 2016.
 Chen et al. [2018] Michael Chen, Minseok Kim, Alex MH Wong, and George V Eleftheriades. Huygens’ metasurfaces from microwaves to optics: A review. Nanophotonics, 7(6):1207–1231, 2018.
 Dai et al. [2020] Linglong Dai, Ruicheng Jiao, Fumiyuki Adachi, H Vincent Poor, and Lajos Hanzo. Deep learning for wireless communications: An emerging interdisciplinary paradigm. IEEE Wireless Communications, 27(4):133–139, 2020.
 Elbir and Coleri [2021] Ahmet M. Elbir and Sinem Coleri. Federated learning for channel estimation in conventional and RISassisted massive MIMO. IEEE Transactions on Wireless Communications, 21(6):4255–4268, 2021.
 Elbir and Mishra [2020] Ahmet M Elbir and Kumar Vijay Mishra. A survey of deep learning architectures for intelligent reflecting surfaces. arXiv preprint arXiv:2009.02540, 2020.
 Elbir et al. [2020] Ahmet M Elbir, Anastasios Papazafeiropoulos, Pandelis Kourtessis, and Symeon Chatzinotas. Deep channel learning for large intelligent surfaces aided mmWave massive MIMO systems. IEEE Wireless Communications Letters, 9(9):1447–1451, 2020.
 Elbir et al. [2022] Ahmet M. Elbir, Kumar Vijay Mishra, M. R. Bhavani Shankar, and Symeon Chatzinotas. The rise of intelligent reflecting surfaces in integrated sensing and communications paradigms. arXiv preprint arXiv:2204.07265, 2022.
 Epstein and Eleftheriades [2016a] Ariel Epstein and George V Eleftheriades. Arbitrary powerconserving field transformations with passive lossless omegatype bianisotropic metasurfaces. IEEE Transactions on Antennas and Propagation, 64(9):3880–3895, 2016a.
 Epstein and Eleftheriades [2016b] Ariel Epstein and George V Eleftheriades. Huygens' metasurfaces via the equivalence principle: Design and applications. Journal of the Optical Society of America B, 33(2):A31–A50, 2016b.
 Esmaeilbeig et al. [2022a] Zahra Esmaeilbeig, Kumar Vijay Mishra, Arian Eamaz, and Mojtaba Soltanalian. Cramér–rao lower bound optimization for hidden moving target sensing via multiIRSaided radar. arXiv preprint arXiv:2210.05812, 2022a.
 Esmaeilbeig et al. [2022b] Zahra Esmaeilbeig, Kumar Vijay Mishra, and Mojtaba Soltanalian. IRSaided radar: Enhanced target parameter estimation via intelligent reflecting surfaces. In IEEE Sensor Array and Multichannel Signal Processing Workshop, pages 286–290, 2022b.
 Feng et al. [2020] Keming Feng, Qisheng Wang, Xiao Li, and ChaoKai Wen. Deep reinforcement learning based intelligent reflecting surface optimization for MISO communication systems. IEEE Wireless Communications Letters, 9(5):745–749, 2020.
 Fong et al. [2010] Bryan H Fong, Joseph S Colburn, John J Ottusch, John L Visher, and Daniel F Sievenpiper. Scalar and tensor holographic artificial impedance surfaces. IEEE Transactions on Antennas and Propagation, 58(10):3212–3221, 2010.
 Gao et al. [2020] Jiabao Gao, Caijun Zhong, Xiaoming Chen, Hai Lin, and Zhaoyang Zhang. Unsupervised learning for passive beamforming. IEEE Communications Letters, 24(5):1052–1056, 2020.
 Glybovski et al. [2016] Stanislav B Glybovski, Sergei A Tretyakov, Pavel A Belov, Yuri S Kivshar, and Constantin R Simovski. Metasurfaces: From microwaves to visible. Physics Reports, 634:1–72, 2016.
 Gong et al. [2020] Shimin Gong, Xiao Lu, Dinh Thai Hoang, Dusit Niyato, Lei Shu, Dong In Kim, and YingChang Liang. Towards smart wireless communications via intelligent reflecting surfaces: A contemporary survey. IEEE Communications Surveys & Tutorials, 22(4):2283–2314, 2020.
 Gong et al. [2022] Shimin Gong, Jiaye Lin, Beichen Ding, Dusit Niyato, Dong In Kim, and Mohsen Guizani. When optimization meets machine learning: The case of IRSassisted wireless networks. IEEE Network, 36(2):190–198, 2022.
 Hodge et al. [2014] John A Hodge, Theodore Anthony, and Amir I Zaghloul. Enhancement of the dipole antenna using a capacitively loaded loop (CLL) structure. In IEEE International Symposium on Antennas and Propagation and USNCURSI Radio Science Meeting, pages 1544–1545, 2014.
 Hodge et al. [2019a] John A Hodge, Kumar Vijay Mishra, and Amir I Zaghloul. Joint multilayer GANbased design of tensorial RF metasurfaces. In IEEE International Workshop on Machine Learning for Signal Processing, pages 1–6, 2019a.
 Hodge et al. [2019b] John A Hodge, Kumar Vijay Mishra, and Amir I Zaghloul. Multidiscriminator distributed generative model for multilayer RF metasurface discovery. In IEEE Global Conference on Signal and Information Processing, pages 1–5, 2019b.
 Hodge et al. [2019c] John A Hodge, Kumar Vijay Mishra, and Amir I Zaghloul. Reconfigurable metasurfaces for index modulation in 5G wireless communications. In IEEE International Applied Computational Electromagnetics Society Symposium, pages 1–2, 2019c.
 Hodge et al. [2019d] John A Hodge, Kumar Vijay Mishra, and Amir I Zaghloul. RF metasurface array design using deep convolutional generative adversarial networks. In IEEE International Symposium on Phased Array Systems and Technology, pages 1–6, 2019d.
 Hodge et al. [2020] John A Hodge, Kumar Vijay Mishra, and Amir I Zaghloul. Intelligent timevarying metasurface transceiver for index modulation in 6G wireless networks. IEEE Antennas and Wireless Propagation Letters, 19(11):1891–1895, 2020.
 Hodge et al. [2021] John A Hodge, Kumar Vijay Mishra, and Amir I Zaghloul. Deep inverse design of reconfigurable metasurfaces for future communications. arXiv preprint arXiv:2101.09131, 2021.
 Holloway et al. [2012] Christopher L Holloway, Edward F Kuester, Joshua A Gordon, John O’Hara, Jim Booth, and David R Smith. An overview of the theory and applications of metasurfaces: The twodimensional equivalents of metamaterials. IEEE Antennas and Propagation Magazine, 54(2):10–35, 2012.
 Huang et al. [2019] Chongwen Huang, George C Alexandropoulos, Chau Yuen, and Mérouane Debbah. Indoor signal focusing with deep learning designed reconfigurable intelligent surfaces. In IEEE International Workshop on Signal Processing Advances in Wireless Communications, pages 1–5, 2019.
 Inampudi and Mosallaei [2018] Sandeep Inampudi and Hossein Mosallaei. Neural network based design of metagratings. Applied Physics Letters, 112(24):241102, 2018.
 Jiang and Fan [2019] Jiaqi Jiang and Jonathan A Fan. Global optimization of dielectric metasurfaces using a physicsdriven neural network. Nano Letters, 19(8):5366–5372, 2019.
 Jiang et al. [2018] Jiaqi Jiang, David Sell, Stephan Hoyer, Jason Hickey, Jianji Yang, and Jonathan A Fan. Datadriven metasurface discovery. arXiv preprint arXiv:1811.12436, 2018.
 Jiang et al. [2019] Jiaqi Jiang, David Sell, Stephan Hoyer, Jason Hickey, Jianji Yang, and Jonathan A Fan. Freeform diffractive metagrating design based on generative adversarial networks. ACS Nano, 13(8):8872–8878, 2019.
 Khan et al. [2021] Saud Khan, Salman Durrani, and Xiangyun Zhou. Transfer learning based detection for intelligent reflecting surface aided communications. In IEEE Annual International Symposium on Personal, Indoor and Mobile Radio Communications, pages 13–16, 2021.
 Kumar et al. [2022] Chandan Kumar, Salil Kashyap, Rimalapudi Sarvendranath, and Supreet Kumar Sharma. On the feasibility of wireless energy transfer based on low complexity antenna selection and passive IRS beamforming. IEEE Transactions on Communications, 70(8):5663–5678, 2022.
 Lecun et al. [2015] Yann Lecun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
 LeCun et al. [2015] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436, 2015.
 Lee et al. [2020] Gilsoo Lee, Minchae Jung, Ali Taleb Zadeh Kasgari, Walid Saad, and Mehdi Bennis. Deep reinforcement learning for energyefficient networking with reconfigurable intelligent surfaces. In IEEE International Conference on Communications, pages 1–6, 2020.
 Li et al. [2015] Haipeng Li, Guangming Wang, HeXiu Xu, Tong Cai, and Jiangang Liang. Xband phasegradient metasurface for highgain lens antenna application. IEEE Transactions on Antennas and Propagation, 63(11):5144–5149, 2015.
 Liu et al. [2020] Shicong Liu, Zhen Gao, Jun Zhang, Marco Di Renzo, and MohamedSlim Alouini. Deep denoising neural network assisted compressive channel estimation for mmWave intelligent reflecting surfaces. IEEE Transactions on Vehicular Technology, 69(8):9223–9228, 2020.
 Liu et al. [2018] Zhaocheng Liu, Dayu Zhu, Sean P Rodrigues, KyuTae Lee, and Wenshan Cai. Generative model for the inverse design of metasurfaces. Nano Letters, 18(10):6570–6576, 2018.
 Ma et al. [2020] Donghui Ma, Lixin Li, Huan Ren, Dawei Wang, Xu Li, and Zhu Han. Distributed rate optimization for intelligent reflecting surface with federated learning. In IEEE International Conference on Communications Workshops, pages 1–6, 2020.
 Ma et al. [2018] Wei Ma, Feng Cheng, and Yongmin Liu. Deeplearningenabled ondemand design of chiral metamaterials. ACS Nano, 12(6):6326–6334, 2018.
 Ma et al. [2019] Wei Ma, Feng Cheng, Yihao Xu, Qinlong Wen, and Yongmin Liu. Probabilistic representation and inverse design of metamaterials based on a deep generative model with semisupervised learning strategy. Advanced Materials, 31(35):1901111, 2019.
 Maci et al. [2011] Stefano Maci, Gabriele Minatti, Massimiliano Casaletti, and Marko Bosiljevac. Metasurfing: Addressing waves on impenetrable metasurfaces. IEEE Antennas and Wireless Propagation Letters, 10:1499–1502, 2011.
 Mencagli et al. [2015] Mario Mencagli, Enrica Martini, and Stefano Maci. Surface wave dispersion for anisotropic metasurfaces constituted by elliptical patches. IEEE Transactions on Antennas and Propagation, 63(7):2992–3003, 2015.
 Minatti et al. [2015] Gabriele Minatti, Marco Faenzi, Enrica Martini, Francesco Caminita, Paolo De Vita, David GonzálezOvejero, Marco Sabbadini, and Stefano Maci. Modulated metasurface antennas for space: Synthesis analysis and realizations. IEEE Transactions on Antennas and Propagation, 63(4):1288–1300, 2015.
 Mishra et al. [2019] Kumar Vijay Mishra, John A Hodge, and Amir I Zaghloul. Reconfigurable metasurfaces for radar and communications systems. In URSI AsiaPacific Radio Science Conference, pages 1–4, 2019.
 Mishra et al. [2022] Kumar Vijay Mishra, Arpan Chattopadhyay, Siddharth Sankar Acharjee, and Athina P Petropulu. OptM3Sec: Optimizing multicast IRSaided multiantenna DFRC secrecy channel with multiple eavesdroppers. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 9037–9041, 2022.
 Nguyen and Zaghloul [2018] Quang Nguyen and Amir I Zaghloul. Impedance matching metamaterials composed of ELC and NBSRR. In IEEE Antennas and Propagation Society International Symposium, pages 1–2, 2018.
 Nguyen et al. [2019] Quang Nguyen, K V Mishra, and Amir I Zaghloul. Retrieval of polarizability matrix for metamaterials. In IEEE International Conference on Microwaves, Communications, Antennas and Electronic Systems, pages 1–5, 2019.
 Pereda et al. [2016] Amagoia Tellechea Pereda, Francesco Caminita, Enrica Martini, Iñigo Ederra, Juan Carlos Iriarte, Ramón Gonzalo, and Stefano Maci. Dual circularly polarized broadside beam metasurface antenna. IEEE Transactions on Antennas and Propagation, 64(7):2944–2953, 2016.
 Peurifoy et al. [2018] John Peurifoy, Yichen Shen, Li Jing, Yi Yang, Fidel CanoRenteria, Brendan G DeLacy, John D Joannopoulos, Max Tegmark, and Marin Soljačić. Nanophotonic particle simulation and inverse design using artificial neural networks. Science Advances, 4(6):eaar4206, 2018.
 Qiu et al. [2019] Tianshuo Qiu, Xin Shi, Jiafu Wang, Yongfeng Li, Shaobo Qu, Qiang Cheng, Tiejun Cui, and Sai Sui. Deep learning: A rapid and efficient route to automatic metasurface design. Advanced Science, 2019.
 Renzo et al. [2019] Marco Di Renzo, Merouane Debbah, DinhThuy PhanHuy, Alessio Zappone, MohamedSlim Alouini, Chau Yuen, Vincenzo Sciancalepore, George C Alexandropoulos, Jakob Hoydis, Haris Gacanin, et al. Smart radio environments empowered by reconfigurable AI metasurfaces: An idea whose time has come. EURASIP Journal on Wireless Communications and Networking, 2019(1):1–20, 2019.
 Schurig et al. [2006] D Schurig, J J Mock, and D R Smith. Electricfieldcoupled resonators for negative permittivity metamaterials. Applied Physics Letters, 88(4):041109, 2006.
 Shan et al. [2020] Tao Shan, Xiaotian Pan, Maokun Li, Shenheng Xu, and Fan Yang. Coding programmable metasurfaces based on deep learning techniques. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 10(1):114–125, 2020.
 Sievenpiper et al. [1999] Dan Sievenpiper, Lijun Zhang, Romulo FJ Broas, Nicholas G Alexopolous, Eli Yablonovitch, et al. Highimpedance electromagnetic surfaces with a forbidden frequency band. IEEE Transactions on Microwave Theory and techniques, 47(11):2059–2074, 1999.
 Su et al. [2017] Jianxun Su, Yao Lu, Hui Zhang, Zengrui Li, Yaoqing Lamar Yang, Yongxing Che, and Kainan Qi. Ultrawideband, wide angle and polarizationinsensitive specular reflection reduction by metasurface based on parameteradjustable metaatoms. Scientific Reports, 7:42283, 2017.
 Su et al. [2016] Pei Su, Yongjiu Zhao, Shengli Jia, Wenwen Shi, and Hongli Wang. An ultrawideband and polarizationindependent metasurface for RCS reduction. Scientific Reports, 6:20387, 2016.
 Sun et al. [2012] Shulin Sun, Qiong He, Shiyi Xiao, Qin Xu, Xin Li, and Lei Zhou. Gradientindex metasurfaces as a bridge linking propagating waves and surface waves. Nature Materials, 11(5):426–431, 2012.
 Taha et al. [2020] Abdelrahman Taha, Yu Zhang, Faris B Mismar, and Ahmed Alkhateeb. Deep reinforcement learning for intelligent reflecting surfaces: Towards standalone operation. In IEEE International Workshop on Signal Processing Advances in Wireless Communications, pages 1–5, 2020.
 Taha et al. [2021] Abdelrahman Taha, Muhammad Alrabeiah, and Ahmed Alkhateeb. Enabling large intelligent surfaces with compressive sensing and deep learning. IEEE Access, 9:44304–44321, 2021.
 Torkzaban and Khojastepour [2021] Nariman Torkzaban and Mohammad A Amir Khojastepour. Shaping mmwave wireless channel via multibeam design using reconfigurable intelligent surfaces. In IEEE Global Communications Conference Workshops, pages 1–6, 2021.
 Tretyakov [2015] SA Tretyakov. Metasurfaces for general transformations of electromagnetic fields. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 373(2049):20140362, 2015.
 Wang et al. [2022] Zhaolin Wang, Xidong Mu, and Yuanwei Liu. STARS enabled integrated sensing and communications. arXiv preprint arXiv:2207.10748, 2022.
 Wei et al. [2022] Tong Wei, Linlong Wu, Kumar Vijay Mishra, and MR Bhavani Shankar. IRSaided wideband dualfunction radarcommunications with quantized phaseshifts. In IEEE Sensor Array and Multichannel Signal Processing Workshop, pages 465–469, 2022.
 Wu and Zhang [2019] Qingqing Wu and Rui Zhang. Towards smart and reconfigurable environment: Intelligent reflecting surface aided wireless network. IEEE Communications Magazine, 58(1):106–112, 2019.
 Yang et al. [2021] Helin Yang, Zehui Xiong, Jun Zhao, Dusit Niyato, Liang Xiao, and Qingqing Wu. Deep reinforcement learning based intelligent reflecting surface for secure wireless communications. IEEE Transactions on Wireless Communications, 20(1):375–388, 2021.
 Yu and Deng [2011] D. Yu and L. Deng. Deep learning and its applications to signal and information processing [exploratory dsp]. IEEE Signal Processing Magazine, 28(1):145–154, Jan 2011. ISSN 10535888. doi: 10.1109/MSP.2010.939038.
 Yu et al. [2011] Nanfang Yu, Patrice Genevet, Mikhail A Kats, Francesco Aieta, JeanPhilippe Tetienne, Federico Capasso, and Zeno Gaburro. Light propagation with phase discontinuities: Generalized laws of reflection and refraction. Science, page 1210713, 2011.
 Zhang et al. [2018] Qian Zhang, Che Liu, Xiang Wan, Lei Zhang, Shuo Liu, Yan Yang, and Tie Jun Cui. Machinelearning designs of anisotropic digital coding metasurfaces. Advanced Theory and Simulations, page 1800132, 2018.
 Zhang et al. [2019] Qian Zhang, Che Liu, Xiang Wan, Lei Zhang, Shuo Liu, Yan Yang, and Tie Jun Cui. Machinelearning designs of anisotropic digital coding metasurfaces. Advanced Theory and Simulations, 2(2):1800132, 2019.
 Zhu et al. [2013] HL Zhu, SW Cheung, Kwok Lun Chung, and Tong I Yuk. Lineartocircular polarization conversion using metasurface. IEEE Transactions on Antennas and Propagation, 61(9):4615–4623, 2013.