Deep learning approaches for fast radio signal prediction

by   Ozan Ozyegen, et al.

The aim of this work is the prediction of power coverage in a dense urban environment given building and transmitter locations. Conventionally ray-tracing is regarded as the most accurate method to predict energy distribution patterns in the area in the presence of diverse radio propagation phenomena. However, ray-tracing simulations are time consuming and require extensive computational resources. We propose deep neural network models to learn from ray-tracing results and predict the power coverage dynamically from buildings and transmitter properties. The proposed UNET model with strided convolutions and inception modules provide highly accurate results that are close to the ray-tracing output on 32x32 frames. This model will allow practitioners to search for the best transmitter locations effectively and reduce the design time significantly.


page 5

page 6

page 12

page 13


DRaGon: Mining Latent Radio Channel Information from Geographical Data Leveraging Deep Learning

Radio channel modeling is one of the most fundamental aspects in the pro...

Transfer Learning-Based Received Power Prediction with Ray-tracing Simulation and Small Amount of Measurement Data

This paper proposes a method to predict received power in urban area det...

User guide to TIM, a ray-tracing program for forbidden ray optics

This user guide outlines the use of TIM, an interactive ray-tracing prog...

Deep Learning-based Signal Strength Prediction Using Geographical Images and Expert Knowledge

Methods for accurate prediction of radio signal quality parameters are c...

A Foundation for Wireless Channel Prediction and Full Ray Makeup Estimation Using an Unmanned Vehicle

In this paper, we consider the problem of wireless channel prediction, w...

Interpretable AI-based Large-scale 3D Pathloss Prediction Model for enabling Emerging Self-Driving Networks

In modern wireless communication systems, radio propagation modeling to ...

Dynamic Coherence-Based EM Ray Tracing Simulations in Vehicular Environments

5G applications have become increasingly popular in recent years as the ...

1 Introduction

Predicting power coverage in urban areas requires a ray-tracing software to determine how radio signals propagate and are distributed over an area. Obstacles in the line of sight between any point and the transmitter can attenuate the propagated energy. Buildings and other objects can also cause interference, reflection and refraction which lead to complex energy coverage patterns that are difficult to determine accurately without ray-tracing. However, the process of ray-tracing requires large computational resources and a long simulation time as it applies propagation equations at every pixel.

During the design phase of radio transmitter placement, one objective of ray-tracing is to determine a location for the transmitter. When placed in an optimal location, the transmitter power can reach a maximum number of points in the area leading to an important cost reduction in terms of numbers of transmitters to provide maximum coverage. Ray-tracing simulation time increases proportionally to the region size and may extend the design phase considerably. In complex urban environments, it is not intuitive to determine whether a candidate location is optimal even with prior expertise in the field.

In this study, we propose to apply deep learning to learn from ray-tracing results and predict the power received at every point in a given area with one transmitter. Once the model is trained, the duration of the design phase can be reduced drastically since predicting the power coverage given a transmitter location in a communications environment can be done instantaneously. Specifically, we can determine a set of candidate optimal transmitter locations as a coarse approximation by comparing power coverage map predictions. Designers can then apply ray-tracing as a final fine-tuning step in the design process to choose the best location from the candidate set.

Our proposed model works in 2D and is fed with the building layouts and the transmitter location as a two separate binary input layers. The model is trained to produce the power value at every point in the region. The underlying task considered in this paper is similar to the semantic segmentation with the difference that the predicted values in segmentation are discrete and specify the class of the object in every pixel, while in our problem, the values are real numbers representing the signal power. We propose to use Convolutional Neural Networks (CNN) that was shown to learn efficiently from images, and UNET

(Ronneberger et al., 2015) which is an extension of CNNs and a popular model in image segmentation. The encoder module of the UNET is used to capture the relation between buildings/transmitter locations and the power coverage, while the encoder expands feature maps from the encoder into a full-resolution power coverage map. We also propose improvement to the basic UNET by replacing the pooling layers with strided convolutions and combining the UNET with inception modules as introduced by GoogleNet (Szegedy et al., 2015). These modules allow increasing the network depth and width while keeping the computational cost constant (Zhao et al., 2017).

The rest of this paper is organized as follows: Section 2 provides a background on models used for semantic segmentation and power prediction. We describe our methodology in Section 3, which is then followed up by the experimental results in Section 4. Finally, we provide discussion on our findings and the concluding remarks along with the future work in Section 5.

2 Background

We consider an isotropic antenna and an urban region with buildings. The power that is received at any location varies from the peak power at the transmitter and -100 dBm that we consider as the noise floor below which signals cannot be detected due to background noise. The role of our model is to predict the power value in every pixel. We propose to use a CNN as a parametric function that maps the transmitter and building locations to power values in dBm in a 2D grid.

CNN is a deep learning architecture that is constructed by a set of layers. The core building blocks of a CNN are the convolutional layers that convolve the input with a kernel, then apply an activation function, typically a rectified linear unit (ReLU) or Sigmoid function. CNN design consists of defining the number and sequence of layers and the size of kernels. CNN has wide applications ranging from image classification

(Simonyan and Zisserman, 2014; Szegedy et al., 2015), object detection (Girshick et al., 2015; He et al., 2015, 2016)

and image retrieval systems

(Radenović et al., 2016, 2018). Semantic image segmentation is one of the applications where CNNs were applied successfully (Ciresan et al., 2012; Gupta et al., 2014; Pinheiro and Collobert, 2014). Semantic image segmentation is a task in which we label specific regions of an image according to what object is being shown. This task is commonly referred to as dense prediction (Long et al., 2015) as it is the progression from coarse to fine inference that makes a prediction at every pixel.

When used for classification, the last layers of a CNN consist of one or more fully connected layers that predict the image label. Fully convolutional network (FCN) was introduced by Long et al. (2015) and is a special CNN where the final fully connected layers are replaced with convolutional layers. In this way, FCN can be trained end-to-end, pixels-to-pixels, which is very suitable for the task of semantic segmentation. Long et al. (2015) proposed well-studied classification networks as encoder along with a decoder module with transposed convolutional layers to upsample the coarse feature maps into a full-resolution segmentation map. However, it was challenging to produce fine-grained segmentation from the low resolution encoder output. The authors mitigated this issue by upsampling the encoded representation in stages and by adding skip connections from earlier layers. Skip connections provide the necessary higher resolution details to reconstruct accurate segmentation boundaries. Ronneberger et al. (2015) took the idea of the FCN one step further and proposed the symmetric encoder-decoder structure called UNET. This architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. Skip-connections directly connect opposing contracting and expanding convolutional layers to provide more detail in the segmentation result. Variants of the UNET were proposed in further studies such as the 3D UNET (Çiçek et al., 2016), short skip connections with residual blocks (Drozdzal et al., 2016) and dense blocks (Jégou et al., 2017). The UNET architecture was used in many applications such as medical image analysis (Litjens et al., 2017; Milletari et al., 2016)

, image-to-image translation

(Yi et al., 2017)

and super resolution

(Lim et al., 2017). Levie et al. (2019)

proposed an implementation of the UNET to estimate the path loss function in radio propagation, however, the authors relied on the Dominant Path Model (DPM) that does not consider the paths with small energy contribution. It was not clear from the result of this research whether the proposed RadioUNET was able to capture multiple reflections in complex urban environments as in real ray-tracing outputs.

3 Methodology

In this section, we first describe our single region dataset and the preprocessing steps undertaken to obtain a multi-region dataset. Then, we present the models and their different components. Finally, we specify the details of the model training such as hyperparameters and the choice of evaluation metric.

3.1 Dataset

Our fixed-region dataset consists of a set of simulated radio propagation scenarios in a fixed, urban region in downtown Ottawa, generated using a ray-tracing software (REMCOM, 2020). We have 10,000 samples where each sample is a coverage heatmap over a unique base station transmitter location. The transmitter was assumed to use an isotropic antenna producing omni directional radio waves. The power and height of the antenna were set to 6 metres and 50 Watts respectively. The grid has a size of pixels where each pixel can be considered of size 3 metre square. Each point is represented by its and coordinates and the resulting power coverage, measured in Watts in dBm scale. Measurements are only available for pixels where no building exists. A sample from the original dataset is shown in Fig. 1. White blocks in the left grid represent buildings and other urban objects where no coverage is measured.

Figure 1: Sample Transmitter Coverage Scenario

3.1.1 The multi-region dataset

As we aim to solve the power coverage estimation problem for urban areas, we need to train the prediction model on different urban environments. However, the fixed-region dataset contains only one single environment. Thus, we selected smaller frames from the fixed-region dataset to create a new multi-region dataset. This process is performed as follows. First, for every sample in the fixed-region dataset, a sliding window is moved through the sample with a stride of . We only consider frames that include the transmitter and building blocks since we aim to estimate power coverage of the transmitter in an urban environment. Some frames contain a high rate of reflections coming from buildings outside the frame itself. These are problematic cases since during training, the model only receives the 32

32 frame and it cannot be informed about the reflections coming from outside the frame. Fortunately, most of these cases happen when the transmitter is near the edges of the frame. Thus, we applied a padding of 5 pixels from all sides and all frames where the transmitter is near the edges are removed from the multi-region dataset.

After the samples were generated, we apply a minimum threshold of , since the coverage estimations are uninteresting below this value. Finally, we apply min-max normalization over the power coverage and scaled the samples between . A sample frame from the generated dataset is shown in Fig. 2.

Figure 2: A sample from the dataset.
The blue dot shows the transmitter location.

The dataset is divided into train and test sets based on the frames’ original location on the fixed-region map. We consider the upper-left corner of each frame as the reference point. The training set contains all frames whose upper-left corner -coordinate is below 60. Test set frames are above the -coordinate of 80. We leave a gap of 20 between the train and test set regions to prevent overlapping. At the end of this process, the dataset is split into 390,969 training samples and 117,666 test samples.

3.1.2 Feature engineering

The dataset consists of two sets of input features: transmitter location and urban environment. We tested four different approaches to feed these features to the models. In the first approach, both the building location and the environment are provided in the same input channel. Consequently, the input is a 2D image, where the pixel values are set to one and two for building and transmitter locations, respectively. In the second scenario, we divide the input into two separate channels. Both channels are binary 2D images. The first encodes building locations with ones (i.e. a white colour) while in the second, a one corresponds to the transmitter location. In the third and fourth scenarios, the urban environment is represented by the same 2D image, but in this case the transmitter is represented by a 2D image where the value of each pixel is equal to its distance from the transmitter. We used Euclidean distance and inverse square distance as the distance metrics in the third and fourth scenarios, respectively. In all cases, the default pixel value is set to zero.

We performed preliminary analysis with all four input scenarios and observed that the second scenario provides the best performance in terms of the Mean Absolute Error (MAE). Thus, we retain these input features in the remaining experiments of this paper.

3.2 Deep learning models

We trained two CNN models for the power coverage prediction, namely, vanilla CNN and UNET. Below we provide a brief description of these models along with the configuration we adopted in our analysis.

3.2.1 Baseline CNN

We consider a vanilla CNN model as a baseline prediction model. The model consists of 24 2D convolutional layers with kernels. Each layer has 32 filters, ReLU activations and identical padding so that the output size of the layer is same as the input size. Fig. 3 demonstrates the adopted CNN architecture.

Figure 3: Baseline CNN model architecture

3.2.2 Unet

UNET is a special type of CNN which was originally developed for biomedical image segmentation (Ronneberger et al., 2015). The encoder-decoder architecture of the UNET is illustrated in Fig. 4. The left side of the U shape is called the encoder. It mainly consists of convolutions, each followed by a ReLU and max pooling layers with a stride of two for downsampling. The number of features are doubled at each downsampling step.

Figure 4: UNET Model Structure

The decoder to the right hand side of the network uses transposed convolutions for upsampling to increase the output size to the same size as the input. The decoder consists of convolutions which halves the number of feature channels, a concatenation with the corresponding encoder layer’s output, and two convolutions, each followed by a ReLU activation. The last layer is a convolution with a single channel. The network has 23 convolutional layers in total.

One of the main problems using CNNs for segmentation is the pooling layers. When such layers are used, they aggregate the nearby pixels, therefore the spatial information is lost. UNET solves this issue by an encoder-decoder architecture and skip connections. Each encoder layer has a skip connection with its corresponding decoder layer to allow the decoder network to retrieve fine-grained details from the encoder.

3.3 Model enhancements

The performances of the vanilla CNN and UNET models can be enhanced by incorporating special layers such as inception and strided convolutions that are customized for the power coverage prediction task. Below we briefly describe these specific modules.

3.3.1 Inception layers

Inception was an important development for the task of object detection and classification. Before the inception concept, large networks were constructed by stacking several convolutional layers on top of each other, which makes the model prone to overfitting and computationally expensive. An image consists of several parts and these parts can have different sizes in different pictures. Therefore, selecting an appropriate kernel size is a difficult task. The inception structure suggested using several kernel sizes operating on the same level (Szegedy et al., 2015). In this work, we use three parallel kernel sizes in the inception structure depicted in Fig. 5. The output of the previous layer passes through and kernels. The outputs are concatenated by the channel dimension before being passed to the next layer. The kernel size of is computationally cheaper compared to other kernel sizes and is typically employed to change the number of channels.

Figure 5: The proposed inception layer structure

3.3.2 Strided convolutions

The basic UNET model uses max pooling layers to downsample images. However, Springenberg et al. (2014) showed that replacing max pooling layers with strided convolutions can improve the accuracy of the model with the same depth and width. This improvement comes with a cost as max pooling is a fixed operation in which the model does not learn any new parameters which makes it computationally cheaper. However, the same procedure with strided convolutions requires the model to learn new parameters and thus increases the computational cost.

3.4 Model training

We perform supervised learning on the multi-region dataset of urban environments to estimate the power coverage at each location based on a given transmitter location and environment. The MAE is used as the accuracy metric on the normalized power values. After training, MAE scores are denormalized and reported because they are easier interpret. The MAE shows the error average in dB in a given frame. In all models, the Adam optimizer 

(Kingma and Ba, 2014) is used with an initial learning rate of 0.001 and exponential decay rates of 0.9 and 0.999 for and

, respectively. The models are trained with mini-batches of 128 samples for 40 epochs. Early stopping with a patience of three is used to monitor the test loss.

4 Results

We perform an extensive numerical study with CNN and UNET models to assess their effectiveness for the power coverage prediction task. We first compare the performances of these models for different hyperparameter settings. Then, we explore the impact of kernel sizes on prediction performances. Finally, we demonstrate the performance gain obtained through inception and strided convolutions. All the numerical analysis are performed on an NVIDIA TESLA P40 GPU with 24 GB of RAM.

4.1 Comparing model performances

The performance of the UNET and the CNN models with different kernel sizes and for five repeats are reported in Table 1, which shows model parameters, MAE and training run time. We observe that the kernel size of 5 leads to the best performance for both CNN and UNET models. The best overall performance is obtained by UNET model with a kernel size of 5, which leads to 5.2% improvement in terms of average MAE compared to its CNN counterpart (2.69 vs 2.55), though it has significantly longer run times (0.52 hours vs 2.4 hours). We note that benefits of UNET is amplified for individual cases, e.g., MAE improvements are 13.6% (2.64 vs 2.28) and 11.6% (2.77 vs 2.45) over min and max MAE values. Additionally, we observe that UNET models involve a significantly larger amount of model parameters and, on average, the UNET model run times are approximately two to three times greater than the CNN models. We perform the rest of the numerical experiments using the UNET models as they provide better overall predictions.

Model Kernel size MAE (dB) Time (hr) # of parameters
min max average
CNN 3 2.67 2.73 2.71 0.88 204.35K
CNN 5 2.64 2.77 2.69 0.52 566.33K
CNN 7 2.82 35.71 13.84 0.51 1.10M
UNET 3 2.25 3.92 2.97 1.13 31.03M
UNET 5 2.28 2.45 2.55 2.40 81.23M
UNET 7 2.36 4.17 3.25 4.31 156.54M
Table 1: The baseline CNN and UNET model performances reported for 5 repeats

4.2 The impact of kernel size on the UNET performance

We next provide visual demonstrations of model predictions for different kernel sizes over a sample region map. Fig. 10 illustrates the UNET model predictions with the same sample and with three different kernels. We oberve that kernel size of 55 performs best with an MAE of 3.57 dB while a lower kernel of 33 and a higher kernel of 77 yield less accurate predictions. For all cases, the power coverage is close to the ground truth around the transmitter located at coordinate (8,28). All three models are able to capture the effect of buildings on decreasing the incoming power by casting a radio shadow behind buildings. However, the difference in accuracy can be clearly observed near the building block located at coordinate . Given the small dimension of the building block and transmitter location, the ground truth shows no significant power attenuation around this block. The prediction with kernel 55 is the closest to the truth which suggests it is the optimal kernel. A smaller kernel of 33 considers only close pixels, this may explain the fuzziness observed around the small building block in Fig. (b)b. A higher kernel of 77 considers a much broader area around every pixel and this produces power distribution patterns that are not consistent with the ground truth in Fig. (d)d.

(a) Region map
(b) UNET model with Kernel 33 prediction
(c) UNET model with Kernel 55 prediction
(d) UNET model with Kernel 77 prediction
Figure 10: UNET predictions for different kernel sizes

4.3 Performance of the enhanced UNET

We trained improved versions of the UNET that include inception modules and strided convolutions. The results obtained with the different variants are reported in Table 2. Replacing max-pooling layers with strided convolutions and a kernel size of 3 decreased the average MAE from 2.97dB (see Table 1) to 2.39dB. The second improvement that is based on introducing inception layers to the UNET is also beneficial compared with the basic UNET as it decreases the average MAE to with a kernel size set of . Incorporating both strided convolutions and inception modules in the UNET leads to a lower average MAE of 2.20dB obtained with a kernel size set of (1,5,7). This model provides the best performance for the power coverage prediction.

Model Kernel size MAE (dB) Time (hr) # of parameters
min max average
UNET + Strided 3 2.28 2.77 2.39 2.25 34.16M
UNET + Inception (1,3,5) 2.25 3.98 3.40 4.27 130.22M
UNET + Inception (1,5,7) 2.27 3.96 3.39 2.73 130.22M
UNET + Inception (3,5,7) 2.27 3.39 2.64 3.95 130.22M
UNET + Strided + Inception (1,3,5) 2.24 3.84 2.78 2.69 90.88M
UNET + Strided + Inception (1,5,7) 2.18 2.22 2.20 5.08 174.44M
UNET + Strided + Inception (3,5,7) 2.22 2.31 2.26 5.77 191.16M
Table 2: Performance of the UNET model variants reported for 5 repeats
(a) Region map
(b) UNET with kernel size of 33
(c) UNET with inception kernel size of (1,5,7)
(d) UNET with strided convolution and inception kernel size of (1,5,7)
Figure 15: Performance of the enhanced UNET model

A sample environment is presented in Fig. (a)a showing the buildings and the transmitter locations along with the predictions obtained with three UNET variants on this region for performance comparison. The three UNET models include a simple UNET with kernel size of , a UNET model with inception and kernel sizes of , and a UNET model with strided-convolutions and inception with kernel sizes of . The predictions are shown in Fig.  15. We observe that the MAE of the model with both inception and strided-convolution is lower than the other two models. The difference can be observed visually by comparing the predicted power distribution to the ground truth. Even if the three models are able to predict the power coverage around the transmitter and over the open space on the left side of the image, we clearly see differences in the radio shadow cast by buildings in the right hand side of the image. While the UNET with a single kernel size of 33 predicts irregular patterns, the UNET with strided convolutions and inception with kernels (1,5,7) produced patterns that are much closer to the ground truth. This comparison emphasizes the advantage of multiple kernels that are able to capture complex details at different distances from each pixel.

5 Discussions and Conclusions

In this paper we propose two deep neural network architectures, namely, the CNN and the UNET to predict the power coverage in an urban environment. The CNN is fully convolutional with no pooling layers, while the UNET has an encoder-decoder structure with skip connections. Both structures produce an output with the same input size. The dataset consists of 10,000 power measurements produced by ray tracing over a fixed urban region. To create enough samples for training and testing, frames of size 3232 pixels were generated using a sliding window approach. The transmitter and building locations are given as two separate binary matrices to the CNN and UNET models.

The predictions are generated as a single matrix of power values estimated in each pixel. Our numerical analysis indicate that the CNNs performed worse than the UNET. To improve the performance of the UNET, max pooling layers were replaced by strided convolutions and inception modules with multiple kernel sizes were included. The result shows that this network upgrade had a positive impact on the prediction performance. A comparison between fixed sized kernels and multiple parallel kernels in inception modules shows that parallel kernels are more efficient as they enable capturing more details in radio propagation patterns and produce a higher accuracy in power coverage compared to fix sized kernels.

Accurate prediction of power coverage values has important business implications. Commonly used ray-tracing software is proprietary and costly to use in practice (REMCOM, 2020). Our models can be used to substantially reduce the reliance on the ray-tracing software. For instance, the software can be employed to generate an initial sample of coverage heatmaps for a given region. Then, CNN and UNET models can be trained on this sample and used for predicting the power coverage values within the region as well as for other regions. To the best of our knowledge, there are only a few other studies that consider the power coverage prediction task and a recent study by Levie et al. (2019) is closest to our work. It is important to note that while we experimented with Levie et al. (2019)’s RadioUNET, we did not obtain reasonable predictive accuracy for our problem. Therefore, RadioUNET predictions were not presented in our study.

There are certain limitations of our work. We conduct our numerical study using a single data source and a single region dataset. We expect our models to perform similarly for other datasets that are generated with similar ray tracing rules, however, it is left for future research to test out further generalizability of our models with more data. In addition, we only experiment with frames mainly due to lack of available data. On the other hand, we note that our CNN and UNET models can be configured for larger frame sizes, e.g., through increasing the complexity by adding more convolutional layers and deeper inception layers. Other future works include extending the proposed approaches to predict the impact of other objects such as trees or moving vehicles on the power coverage. In fact, the current model architecture is well suited to including additional input layers and associate each layer with a different type of object.


This research is enabled by the dataset supplied by Communications Research Centre Canada.


  • Çiçek et al. (2016) Özgün Çiçek, Ahmed Abdulkadir, Soeren S Lienkamp, Thomas Brox, and Olaf Ronneberger. 3D U-Net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention, pages 424–432. Springer, 2016.
  • Ciresan et al. (2012) Dan Ciresan, Alessandro Giusti, Luca M Gambardella, and Jürgen Schmidhuber.

    Deep neural networks segment neuronal membranes in electron microscopy images.

    In Advances in neural information processing systems, pages 2843–2851, 2012.
  • Drozdzal et al. (2016) Michal Drozdzal, Eugene Vorontsov, Gabriel Chartrand, Samuel Kadoury, and Chris Pal. The importance of skip connections in biomedical image segmentation. In Deep Learning and Data Labeling for Medical Applications, pages 179–187. Springer, 2016.
  • Girshick et al. (2015) Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Region-based convolutional networks for accurate object detection and segmentation. IEEE transactions on pattern analysis and machine intelligence, 38(1):142–158, 2015.
  • Gupta et al. (2014) Saurabh Gupta, Ross Girshick, Pablo Arbeláez, and Jitendra Malik. Learning rich features from RGB-D images for object detection and segmentation. In

    European conference on computer vision

    , pages 345–360. Springer, 2014.
  • He et al. (2015) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9):1904–1916, 2015.
  • He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , pages 770–778, 2016.
  • Jégou et al. (2017) Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, and Yoshua Bengio. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 11–19, 2017.
  • Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • Levie et al. (2019) Ron Levie, Çağkan Yapar, Gitta Kutyniok, and Giuseppe Caire. RadioUNet: Fast radio map estimation with convolutional neural networks. arXiv preprint arXiv:1911.09002, 2019.
  • Lim et al. (2017) Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 136–144, 2017.
  • Litjens et al. (2017) Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017.
  • Long et al. (2015) Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
  • Milletari et al. (2016) Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pages 565–571. IEEE, 2016.
  • Pinheiro and Collobert (2014) Pedro HO Pinheiro and Ronan Collobert. Recurrent convolutional neural networks for scene labeling. In 31st International Conference on Machine Learning (ICML), number CONF, 2014.
  • Radenović et al. (2016) Filip Radenović, Giorgos Tolias, and Ondřej Chum. CNN image retrieval learns from bow: Unsupervised fine-tuning with hard examples. In European conference on computer vision, pages 3–20. Springer, 2016.
  • Radenović et al. (2018) Filip Radenović, Giorgos Tolias, and Ondřej Chum. Fine-tuning CNN image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence, 41(7):1655–1668, 2018.
  • REMCOM (2020) REMCOM. Wireless insite, v3.3., 2020.
  • Ronneberger et al. (2015) Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597, 2015. URL
  • Simonyan and Zisserman (2014) Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • Springenberg et al. (2014) Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014.
  • Szegedy et al. (2015) Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
  • Yi et al. (2017) Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE international conference on computer vision, pages 2849–2857, 2017.
  • Zhao et al. (2017) Bo Zhao, Jiashi Feng, Xiao Wu, and Shuicheng Yan. A survey on deep learning-based fine-grained object classification and semantic segmentation. International Journal of Automation and Computing, 14(2):119–135, 2017.