Understanding of radio channel propagation conditions and spatial consistency is important for designing mobile networks especially when moving to millimeter waves and 5G applications. Many research and standardization efforts are directed by ITU-R  and 3GPP  organizations as well as telecommunication companies. The main purpose of radio channel modeling efforts is to build reliable models for radio equipment testing and validation, wireless network planning and compatibility studies.
Dual environment LOS/NLOS boundary approximation has already been introduced in  for Manhattan grid with regular street geometry at the same time maintaining spatial consistency for visibility state distribution. Digital building models of Manhattan city have been used in  to model real-world urban scenarios with trigonometric dual environment boundaries.
In the present work, real-world LOS/NLOS visibility statistics is based on about 1000 base station locations obtained from US Federal Communications Commission (FCC) antenna tower database for Manhattan and San Francisco cities supplemented by terrain elevation and building heights data. More general statistical distribution characteristics for LOS/NLOS probability is estimated including dual environment with boundary approximations by support vector classification (SVC) method. This allows applications of approximate probability to broader class of urban visibility conditions including mixed high and low building densities.
The structure of the paper is following. In Section II, LOS/NLOS probability models accepted by ITU-R and 3GPP are shortly reviewed including spatial consistency requirements. Then dual environment boundary approximations based on trigonometric series and SVC methods are given in Section III, followed by statistical estimation of approximation accuracy in comparison to available deterministic visibility model, and finally conclusions are drawn.
Ii LOS/NLOS Visibility State Probability
Ii-a LOS Probability Models
The most commonly used LOS state probability approximations are based on 3GPP  and ITU  3D channel models representing different propagation scenarios. 3GPP and ITU-R proposed LOS probability models can be approximated by the following distance dependence laws for urban (UMa) and rural (RMa) macrocell areas:
where is the distance in meters and , are empirical data fit coefficients. Similar LOS/NLOS probability models are suggested in other proposals [5, 6] and are discussed in more detail in . Characteristic feature of such models is the existence of direct LOS visibility region around base station up to distance for urban environment.
Ii-B Dual Environment LOS Probability Models
To better represent real propagation environments which usually contain nonuniform building blocks of varying height, a combined dual environment model which approximates LOS probability over distance using two probability functions was proposed in [3, 4]. The composite LOS probability is characterized by two separate regions with different LOS probabilities and can be estimated as
where function indicates density of buildings in zone 1 characterized by LOS probability at distance from the base station, while denotes the shortest distance to the boundary of zone 1.
Considering continuous normal distributionof blocking obstacle heights with mean height , the transition function
separating two regions can be expressed by complementary cumulative distribution function as
More detailed account on the dual LOS environments are given in .
There is still an open question about geometry of the boundary dividing site coverage area into different visibility environments. In order to enable usage of such boundary for a channel model, the boundary should be smooth enough and independent of particular local obstacle distribution in order to be used as a representation of a generalized typical environment. In the next section two possible approximations of such LOS/NLOS boundary generalizations are discussed.
Iii LOS/NLOS Boundary Approximations
Here we present two methods for generalizing LOS/NLOS boundaries obtained from deterministic line-of-sight models which take into account terrain elevation and building heights. After predicting line-of-sight areas from base station antenna locations within given radius the following two approximations are suggested.
Iii-a Trigonometric Series Approximation
We define LOS/NLOS boundary vector as a parametric equation with radius over polar angle around base station location :
which minimizes discrete least squares error at angular points along the LOS/NLOS boundary (Fig. 1 left):
For numerical solution of this minimization problem Levenberg-Marquardt algorithm is used as implemented in Python’s SciPy library . To make LOS zones compact around central base station points the weight coefficients were optimized as error penalties equal for all points within NLOS zone based on the following condition:
For optimization, fixed number of boundary points and variable length of trigonometric series were used.
Iii-B Support Vector Classification Method
An alternative method for LOS/NLOS boundary approximation the support vector classification (SVC), namely, -SVC  has been chosen for generating generalized boundaries from the given labeled dataset of points with geometrically estimated visibility conditions based on elevation and building height data. We used rectangular mesh with 2 m step for points evenly distributed over analysis area around the base station. -SVC classification method reduces to minimization problem of objective function for a hyper-plane with the normal and bias :
with hyper-parameter representing upper bound on the misclassified margin error, denoting lower bound on and being slack variables. The geometry of hyper-plane in 2D space used to model LOS/NLOS boundaries is shown Fig. 1 right. The nonlinear decision function is constructed as a linear combination of support vectors based on Gaussian kernel . The parameter controls the smoothness of approximated boundary which can be expressed via spatial deviation as . Taking into account that the decorrelation distance due to shadowing may reach up to 50 m , the generalized boundary has to be defined by lower spatial variation, therefore for simulations was chosen corresponding to spatial deviation m. To improve performance of SVC classification the ensemble learning – bootstrap aggregating  has been used as implemented in Python’s scikit-learn library . The number of ensemble estimators during SVC optimization has been varied between 10 and 40.
Iii-C Statistical Results of LOS Probability Approximations
About 1000 base station locations from Manhattan and San Francisco cities have been used to generate deterministic line-of-sight coverages around base stations within 500 m radius taking into account base station antenna heights, terrain elevation data and building heights. Antenna tower data has been obtained from US FCC Antenna Structure Registration database . Digital elevation model (DEM) of 1/3 arc-second resolution available from US Geological Survey (USGS)  has been used for terrain modeling. Building heights were extracted from building footprint datasets provided by open data initiatives of New York  and San Francisco  cities. For visibility calculations DEM raster has been resampled to 10 m resolution and combined with height information extracted from vector-type building footprints. The resolution of combined surface raster has been set to 2 m which was the final resolution of all visibility predictions presented here.
A typical selection of LOS estimation results for different visibility conditions is shown in Fig. 2. Here the top row show sites with about 10% directly visible LOS locations within site coverage, the middle row corresponds to LOS/NLOS fraction of 30% and the bottom row contains sites with 50% or more open locations.
Then for each of base station locations the LOS probability dependency on distance is estimated over whole cell area and approximated by several methods: (i) by single exponential with default parameter values as in 3GPP (1); (ii) with single exponential but fitted parameters ; (iii) with fitted 3GPP exponential parameters and having optimized minimum LOS distance ; and (iv) using dual environment boundary approximations by trigonometric series and SVC method. The difference between deterministic LOS probability and estimated by various model approximations is shown in Fig. 3 for a single site which 2D coverage is represented in the center of Fig. 2 with 30% direct LOS visibility. From the visual comparison of these results, all the approximations start at LOS probability equal to at the base station location except the case of all 3GPP parameters being optimized. Although in this approximation follows deterministic LOS probability most closely it lacks 3GPP requirement to have immediate line-of-sight area within base station’s close vicinity.
For this specific site represented by Fig. 3 LOS probability approximations, a 2D plot of LOS/NLOS boundary approximations by trigonometric series and SVC classification model is depicted in Fig. 4. For this specific case least squares trigonometric minimization problem resulted in series order and weights , while SVC optimization resulted in and number of ensemble estimators 20. The approximation RMSE errors for trigonometric series and SVC are, respectively, 0.18 and 0.17. These errors belong to the worst end of LOS/NLOS boundary approximation statistics where mixed LOS/NLOS conditions predominate. The staircase look of SVC approximation is due to rasterized sampling of the whole analysis area by 2 m size pixels. For comparative purposes, in the same figure, red contour denotes 40 m buffered zone indicating deterministic LOS/NLOS boundary which is over-complex and too specific for particular urban environment in order to be used for generalized LOS probability models.
Approximation errors for each method depend on the visibility conditions – the fraction of directly visible locations, but the total cumulative distribution function (CDF) of RMSE errors for all base stations clearly shows advantage of dual environment approximations as shown in Fig. 5. The mean RMSE for 3GPP optimized exponential model is 0.020, or 0.018 if minimum LOS distance is included, and 0.010 for dual environment models approximated by trigonometric series or SVC method, while in all cases lower than the previous example.
More thorough picture about LOS probability approximation accuracy can be composed by spreading RMSE errors for different methods over a range of different LOS visibility conditions. In this case the best approximation results with lowest RMSE are grouped along the extra axis of direct LOS visibility as shown in Fig. 6
. Here for each base station the best accurate approximation is selected and resultant RMSE error distribution is divided into five quantiles which are stretch along LOS visibility axis. While the traditional single exponential 3GPP model works well (RMSE being less than 0.02) in mostly NLOS cases with LOS fraction below 0.17, the rest visibility range results in RMSE errors between 0.02 and 0.2. The other two approximations, 3GPP optimized and especially dual environment model have significant proportion of mid-range LOS fraction between 0.2 and 0.6 covered by approximation RMSE errors below 0.02.
Although most methods have tendency to work best with dense urban NLOS conditions, the dual environment approximations tend to be more suitable for intermediate visibility conditions, where large portions of up to 40-60% are attributed to open areas.
The statistical results of LOS probability gathered for urban areas in San Francisco and Manhattan cities support possibility of using dual environment model approximations with boundaries based on trigonometric series or SVC classification. Such approximations are especially advantageous at higher percentages of directly visible areas within base station coverage. These LOS conditions indicate nonuniform LOS environments where high dense and low density urban regions are located. Dual environment approximations could be used in combination to single exponential LOS probability models commonly used to homogeneous environments such as urban, suburban or rural. In this case more complexity of dual environment model results in higher accuracy simulation model at the same time maintaining generality and spatial consistency required by wireless channel models.
-  ITU-R, “Guidelines for evaluation of radio interface technologies for IMT-Advanced,” Tech. Rep. M.2135-1, 2009.
-  3GPP TR 36.873, “Study on 3D channel model for LTE,” Tech. Rep. Version 12.7.0, Dec. 2017.
-  R. Aleksiejunas, A. Cesiul, and K. Svirskas, “Spatially consistent LOS/NLOS model for time-varying MIMO channels,” in 2018 Baltic URSI Symposium (URSI), May 2018, pp. 61–64.
-  R. Aleksiejunas, A. Cesiul, and K. Svirskas, “Statistical LOS/NLOS Channel Model for Simulations of Next Generation 3GPP Networks,” Elektronika ir Elektrotechnika, vol. 24, no. 5, pp. 74–79, Oct. 2018.
-  Y. Wang, J. Xu, and L. Jiang, “Challenges of System-Level Simulations and Performance Evaluation for 5G Wireless Networks,” IEEE Access, vol. 2, pp. 1553–1561, 2014.
F. Ademaj, M. Taranetz, and M. Rupp, “3GPP 3D MIMO channel model: a holistic implementation guideline for open source simulation tools,”EURASIP Journal on Wireless Communications and Networking, vol. 2016, no. 1, p. 55, Feb. 2016.
-  Aalto University et al., “5G Channel Model for bands up to 100 GHz,” Tech. Rep. V.2.3, Oct. 2016. [Online]. Available: http://www.5gworkshops.com/5GCM.html
-  L. Reichel, G. S. Ammar, and W. B. Gragg, “Discrete Least Squares Approximation by Trigonometric Polynomials,” Mathematics of Computation, vol. 57, no. 195, pp. 273–289, 1991.
-  E. Jones, T. Oliphant, P. Peterson et al., “SciPy: Open source scientific tools for Python,” 2001–. [Online]. Available: https://www.scipy.org/
-  B. Schölkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett, “New Support Vector Algorithms,” Neural Computation, vol. 12, no. 5, pp. 1207–1245, May 2000.
-  G. Louppe and P. Geurts, “Ensembles on Random Patches,” in Machine Learning and Knowledge Discovery in Databases. Berlin, Heidelberg: Springer Berlin Heidelberg, vol. 7523.
-  F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
-  U.S. Federal Communications Commission, “Antenna Structure Registration,” 2020. [Online]. Available: https://www.fcc.gov/wireless/systems-utilities/antenna-structure-registration
-  U.S. Geological Survey, “1/3rd arc-second Digital Elevation Models (DEMs) - USGS National Map 3DEP Downloadable Data Collection,” 2017. [Online]. Available: https://www.sciencebase.gov/catalog/item/4f70aa9fe4b058caae3f8de5
-  City of New York, “Building Footprints (NYC OpenData),” 2019. [Online]. Available: https://catalog.data.gov/dataset/building-footprints-92723
-  City and County of San Francisco, “Building Footprints (DataSF),” 2019. [Online]. Available: https://data.sfgov.org/Geographic-Locations-and-Boundaries/Building-Footprints/ynuv-fyni