A Tale of Three Datasets: Towards Characterizing Mobile Broadband Access in the United States

02/15/2021 ∙ by Tarun Mangla, et al. ∙ 0

Understanding and improving mobile broadband deployment is critical to bridging the digital divide and targeting future investments. Yet accurately mapping mobile coverage is challenging. In 2019, the Federal Communications Commission (FCC) released a report on the progress of mobile broadband deployment in the United States. This report received a significant amount of criticism with claims that the cellular coverage, mainly available through Long-Term Evolution (LTE), was over-reported in some areas, especially those that are rural and/or tribal [12]. We evaluate the validity of this criticism using a quantitative analysis of both the dataset from which the FCC based its report and a crowdsourced LTE coverage dataset. Our analysis is focused on the state of New Mexico, a region characterized by diverse mix of demographics-geography and poor broadband access. We then performed a controlled measurement campaign in northern New Mexico during May 2019. Our findings reveal significant disagreement between the crowdsourced dataset and the FCC dataset regarding the presence of LTE coverage in rural and tribal census blocks, with the FCC dataset reporting higher coverage than the crowdsourced dataset. Interestingly, both the FCC and the crowdsourced data report higher coverage compared to our on-the-ground measurements. Based on these findings, we discuss our recommendations for improved LTE coverage measurements, whose importance has only increased in the COVID-19 era of performing work and school from home, especially in rural and tribal areas.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Affordable, quality Internet access is critical for full participation in the 21st century economy, education system, and government (Roberts2017:Rural). Mobile broadband can be achieved through commercial Long-Term Evolution (LTE) cellular networks, which are a proven means of expanding this access (ITU2017), but are often concentrated in urban areas and leave economically marginalized and sparsely populated areas underserved (FCC2019:Broadband). The U.S. Federal Communications Commission (FCC) incentivizes LTE operators serving rural areas (FCC2017:ConnectAmericaFund; prieger2017) and maintains transparency by releasing maps from each operator showing geographic areas of coverage (fccLTEData). Recently third parties have challenged the veracity of these maps, claiming these maps over-represent true coverage, and thus may discourage much-needed investments.

Most of these claims, however, are either focused on limited areas where a few dedicated researchers can collect controlled coverage measurements (e.g., through wardriving), or are mainly qualitative in nature (ms2019:ruralbroadband; challengesRWA; RWA2018tmob). As dependence on mobile broadband connectivity increases, especially in the face of the COVID-19 pandemic, mechanisms that quantitatively validate FCC coverage datasets at scale are becoming acutely necessary to evaluate and direct resources in Internet access deployment efforts (Pew2019:Mobile; lutu2020characterization). This is an issue of technology and technology policy, with equity and fairness implications for society.

An increasingly widespread approach to measure coverage at scale is through crowdsourcing wherein users of the LTE network contribute to coverage measurements. The FCC has recently advocated for the use of crowdsourcing to validate coverage data reported by operators (fccRecommendations)

. In this context, we take a data-driven, empirical approach in this work, comparing coverage from a representative crowdsourced dataset with the FCC data. More specifically, our analysis is guided by the following questions: (i) How consistent are existing LTE coverage datasets, ii) where and how do their coverage estimations differ, and what trends are present?

We specifically consider a crowdsourced coverage estimate from Skyhook, a commercial location service provider that uses a variety of positioning tools to offer precise geolocation. We select Skyhook because it crowdsources cellular coverage measurements from end-user applications that subscribe to its location services. Such incidental crowdsourcing can potentially provide richer coverage data compared to a voluntary form of crowdsourcing where a user has to explicitly commit to contributing coverage data. We examine this by comparing the Skyhook measurements with those of OpenCellID, an open but voluntary crowdsourced dataset (opencellid). As will be shown in Section 3.1, we find that the density of the crowdsourced datasets varies significantly by the methodology of data collection, especially in rural areas. In the regions we studied, incidental crowdsourcing (Skyhook) gathered up to x more cell IDs than voluntary crowdsourcing (OpenCellID).

Using Skyhook as an extensive crowdsourced dataset, we quantify how widely and where the crowdsourced coverage data differs from the FCC data. We specifically focus on the state of New Mexico111Our methodology is not specific to New Mexico and can be easily extended to other regions in the U.S., selected for its mix of demographics, diverse geographic landscape, and our partnership with community stakeholders within the state. We compare coverage at the level of census blocks222We use the FCC methodology wherein a census block is considered covered if the centroid is covered (centroidMethodology) which are further grouped into urban, rural, and tribal333Tribal areas have consistently experienced the lowest broadband coverage rates in the United States for the past decade (FCC2019:Broadband) categories. We find that the FCC and Skyhook LTE datasets have a disagreement as great as in rural census blocks with the data from FCC claiming higher coverage than Skyhook. A major concern in interpreting this comparison is accounting for coverage disagreement as a result of lack of data points in the crowdsourced dataset. To confirm the availability of users to provide data points, we check for the presence of alternate cellular technologies (e.g., 2G or 3G) within these census blocks and observe a significant number (up to in tribal rural areas) where such alternates are present, providing evidence that users do visit those blocks but cannot access LTE. These results, similar to a recent study on fixed broadband (major2020no), suggest a need for incorporating mechanisms to validate the operator-submitted data into the FCC’s LTE access measurement methodology, especially in rural and tribal areas.

Finally, we compare both FCC and Skyhook coverage maps to our own controlled coverage measurements collected from a northern section of New Mexico. Interestingly, we find that both FCC and Skyhook datasets report higher coverage relative to our controlled measurements with the former showing a higher degree (by up to 26.7%) of over-reporting than the latter. Understanding the causes of these inconsistencies is important for effectively using crowdsourced data to measure LTE coverage, especially as crowdsourcing is increasingly viewed as preferable to provider reports. We conclude with recommendations for improving LTE coverage measurements, whose importance has only increased in the COVID-19 era of performing work and school from home.

Data Set Points of Format Methodology
Collection
FCC Polygon Shapefile Operator-reported
overlay with Form 477
Skyhook Cell signal CSV Incidental
point crowdsourcing
Author Controlled Cell signal CSV Wardriving
Measurements point
Table 1. Summary of coverage data sets.

2. Background and Datasets

In this section, we first provide an overview of the LTE network architecture. This is followed by a description of the LTE coverage datasets compared in our analysis. These datasets are summarized in Table 1. We also note the limitations associated with each data collection methodology.

Figure 1. LTE operators by census block coverage based on FCC data.
Figure 2. Map of author wardriving areas in New Mexico.

2.1. LTE Network Architecture

Internet access in an LTE network is available through base stations (known as eNodeBs) operated by the network provider. User equipment (UE), such as smartphones, tablets, or LTE modems, connects to the eNodeB over the radio link. The eNodeB is connected to a centralized cellular core known as the Evolved Packet Core (EPC). This connection is typically through a wired link forming a middle-mile connection. The EPC consists of several network elements including a Packet Data Network Gateway (PGW), which is the connecting node between an end-user device and the public Internet. Thus, LTE broadband access depends on multiple factors including radio coverage, middle-mile capacity, and interconnection links with other networks (e.g., transit providers, content providers) in the public Internet. However, the focus of this article is on understanding the last-mile LTE connectivity characterized by the radio coverage of the eNodeB.

An eNodeB controls a single cell site and consists of several radio transceivers or cells mounted on a raised structure such as a mast or a tower. The radio cells use directional antennas, where each antenna provides coverage in a smaller geographical area using one frequency band. The radio cells can be identified through a globally unique number called cell identifier (or cell ID), which is also visible to an end-user device in range of the cell. The cell ID enables aggregation of connectivity and signal strength information from multiple UEs connected to the same cell, which can then be used to estimate the geolocation of a cell along with its coverage (see Section 2.3).

2.2. FCC Dataset

The FCC LTE broadband dataset consists of coverage maps in shapefile format that depict geospatial LTE network deployment for each cellular operators in the U.S. The FCC compiles this dataset semi-annually from operators through Form 477. Every operator that owns cellular network facilities must participate in this data collection. The operators submit shapefiles containing detailed network information in the form of geo-polygons along with the frequency band used in the polygon and the minimum advertised upload and download speeds. The methodology used for obtaining these polygons is proprietary to each operator. Ultimately, the FCC publishes only a coverage map that represents coverage as a binary indicator: in any location, cellular service is either available though an operator, or it is not.

We use the binary coverage shapefiles, available on the FCC’s website, from June 2019444At the time of this analysis, data from December 2019 was also available on the FCC website. However, we use data from June 2019 as the other two datasets in our analysis are collected around this period.. Figure 2 shows the eight LTE network operators present in the state of New Mexico (NM) and the percentage of total census blocks in NM covered by each operator. Note that we use one of the FCC methodologies to report mobile broadband access, wherein a census block is considered covered if the centroid of the census block is covered (centroidMethodology). In this paper, we limit our analysis to the top four cellular operators due to their significantly greater prevalence in NM; these operators are also the top four cellular operators in the United States more broadly.

Limitations: These coverage maps are generated using predictive models that are proprietary to the operator (gaoReport) and not generally reproducible. Furthermore, the publicly available dataset consists of binary coverage and lacks any performance-related data.555The FCC has only recently (beginning December 2019) started providing speed data along with coverage information.

2.3. Skyhook Dataset

Skyhook is a location service provider that uses a variety of positioning tools, including a database of cell locations, to offer precise geolocation to subscribed applications (skyhook). Through apps that subscribe to Skyhook’s location services, user devices report back network information, which is gathered into anonymous logs and used to further improve the localization service. Through a data access agreement we are able to view the cell location database consisting of a list of unique cell IDs along with the cell technology (e.g., 3G vs LTE), estimated location, and the estimated coverage. The database was originally constructed through extensive wardriving but is now managed and updated using measurements gathered by devices using the Skyhook API for localization. The device measurements with the same cell ID are combined to estimate the cell location and coverage in the following manner:

Cell location estimation: A grid-based methodology similar to that proposed by Nurmi et al. (nurmi2010grid) is used to estimate the cell tower location. Specifically, Skyhook divides the geographic area into m squares and groups measurements in the same square to obtain a central measure of the square’s signal strength. This is done to reduce the bias due to large numbers of measurements coming from the same area (e.g. a popular gathering place). A weighted average of the signal strength is then used to estimate the cell location.

Estimation of cell coverage radius: Skyhook also provides an estimate of the cell’s coverage radius using a proprietary method based on the path-loss gradient (tse2005fundamentals)

. The path-loss gradient approximates how the wireless signal attenuates as a function of the distance from the transmitter (radio cell in this case). The value of the path-loss gradient depends on several factors such as environment (foliage, buildings), geographic topography, and cell signal frequency. Skyhook estimates the path-loss gradient using field observations of cell signal strength readings along with their distributed geographic locations. Ideally, the signal attenuation varies based on the direction and the distance from the cell. However, to reduce the complexity of coverage estimation, Skyhook’s cell coverage estimation heuristic calculates only one path-loss gradient for a single cell. The path-loss gradient is then used in a set of parameterized equations to estimate the cell coverage radius. The parameters in these equations have been determined with careful research and testing over more than 10 years.

The cell location database is updated regularly with recalculation of cell location and cell coverage radius using the new device measurements that have been collected since the last update. For our analysis, we use the cell location database last updated on June 10, 2019.

Limitations: Since database entries are crowdsourced when the device passes within range of a cell, this dataset is more comprehensive in population centers and highways where people more often occupy. If there are too few measurements overall, or if measurements are primarily sourced from the same grid section, then the cell location estimate can be inaccurate.

2.4. Targeted Measurement Campaign

To complement these datasets, we performed a targeted measurement campaign collecting coverage information through 120 miles of Rio Arriba county in New Mexico over a period of five days beginning May 28, 2019. Figure 2 shows the locations of ground measurements and the four descriptive area labels we use for this analysis. The North area measurements were taken on highways passing primarily through national forest. The Pueblo area measurements were taken from highways within tribal jurisdiction boundaries. In Santa Clara Pueblo, tribal leadership permitted us to collect additional measurements in residential zones. Finally, the Santa Fe area consists of highway measurements between the pueblos and downtown Santa Fe. While limited in scale, these active measurements provide an important comparison point for coverage and user experience. As described in Section 1, we selected these areas of New Mexico for their mix of tribal and non-tribal demographics; tribal lands tend to have the highest coverage over-statements and the most limited cellular availability within the United States (FCC2019:Broadband).

Our measurements consist of service state and signal strength readings recorded on four Motorola G7 Power (XT1955-5) phones running Android Pie (9.0.0). Service State is a discrete variable indicating whether the phone is connected to a cell. Measurements were collected using the Network Monitor application (networkmonitor). An external GlobalSat BU-353-S4 GPS connected to an Ubuntu Lenovo ThinkPad laptop gathered geolocation tags that were matched to network measurements by timestamp. Each phone was outfitted with a SIM card from one of the four top cellular operators in the area: Verizon, T-Mobile, AT&T, and Sprint. The phones recorded service state and signal strength every 10 seconds while we drove at highway speeds (between and miles per hour) in most places and less than miles per hour in residential areas (Santa Clara Pueblo).

Limitations: Our wardriving campaign was intensive in terms of human effort, economic cost, and time, making it difficult to scale. The dataset does not capture any temporal variations in coverage as the measurements were collected over a short span of time. It is possible that driving speed or device configuration affects the measurements, e.g., indicating no coverage when a stationary measurement might have detected coverage (fida2018impact). We have no evidence that this occurred, but it might warrant some additional investigation.

3. Analysis

In this section, we first evaluate of Skyhook as a representative crowdsourced dataset by comparing it with a popular voluntary crowdsourced data from OpenCellID (opencellid). This is followed by comparison of coverage across the FCC, Skyhook, and our wardriving measurement data. Our comparison is guided by the following questions: (i) what is the degree of coverage agreement across the datasets, ii) where and how do their coverage estimations differ?

County classification Region County Population density (per sq. mile) Skyhook OpenCellID Common CIDs
Name CIDs (#) % Overlap CIDs (#) % Overlap CIDs
Western Los Angeles, CA 2,490.3 133,484 28% 39,875 92% 36,816
Large Metro Central Denver, CO 4,683.0 11,061 24% 3,136 86% 2,689
Eastern Fulton, GA 1,994.0 27,809 22% 7,225 86% 6,194
Western Imperial, CA 43.5 1,818 17% 336 93% 311
Small Metro Central Doña Ana, NM 57.1 1,870 32% 663 89% 592
Eastern Bibb, GA 613.0 1,953 21% 464 89% 413
Western Tehama, CA 21.7 733 17% 158 80% 126
Micropolitan Central Rio Arriba, NM 6.7 333 8% 30 87% 26
Eastern Pierce, GA 61.3 164 9% 21 67% 14
Table 2. Characteristics and cell ID (CID) counts in selected counties.

3.1. Comparison of Crowdsourced Datasets

We compare the Skyhook dataset with a publicly available crowdsourced dataset – OpenCellID. Unwired Lab’s OpenCellID666OpenCellID Project is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. project provides a publicly available dataset of cell IDs along with their estimated location. The dataset is derived from crowdsourced UE signal strength measurements similar to Skyhook. However, the UE measurements in this case come from users voluntarily installing the OpenCellID application on their smartphone (opencellid) and manually choosing what data to upload. We differentiate this voluntary crowdsourcing method of data collection from Skyhook’s incidental crowdsourcing method, where users of the Skyhook API contribute to the data by default. We specifically compare the number of unique LTE cells and the recentness of the measurements in both datasets. We consider each of these factors to contribute to the overall density of the dataset.

Figure 3. CDF of cell updates in Skyhook (S) and OpenCellID (O).

Methodology: While our coverage comparison will be focused on New Mexico, we analyze our selected crowdsourced data more broadly by considering these datasets within a set of counties of differing population densities across the United States. The counties are selected from three areas of the United States: Western (California), Central (New Mexico and Colorado), and Eastern (Georgia). Within each region, we consider three different kinds of counties as defined by the National Center for Health Statistics’ 2013 Urban-Rural Classification Guide (nchs2013). These are: (i) large metropolitan (large), which contain a population of at least one million and a principle city; (ii) small metropolitan (small), which contain a population of less than 250,000; and (iii) micropolitan (micro), which must have at least one urban cluster of at least 10,000, but a total population of less than 50,000. This enables us to study differences based on population density and geographic region for the crowdsourced datasets. We select three counties of each population category, for a total of nine counties, to compare these two datasets. We describe these counties in Table 2. For each county, we show the 2018 population density estimated from the U.S. Census Bureau’s 2010 census records (pop2010). We first count the number of unique cell IDs that appear in both datasets for each county, as shown in Table 2. The “% Overlap” column in Table 2 shows the percentage of each dataset’s cell IDs that also appear in the other dataset, and the “Common CIDs” column shows the exact number of common cell IDs.

Results: Overall, Skyhook reports a greater number of cells (2.8x - 11.1x) for all counties. The difference is particularly pronounced in micro counties. This suggests that relying on volunteers to download an application and offer network measurements may not be the most accurate method for assessing LTE coverage in rural areas. Furthermore, Skyhook includes a majority of the cells that appear in OpenCellID.

We next consider how recently each cell ID record was updated with a new measurement. Figure 3 shows the CDF of the latest measurement date for cells in both the datasets, where cells are split into those located in urban and rural census blocks. Almost 60% of the cells in Skyhook were last updated in the month of June 2019, but the most recent update in OpenCellID was in February 2019. Furthermore, cells in rural census blocks were updated less recently than urban census blocks in OpenCellID, while the difference is negligible in the Skyhook dataset. This suggests that the Skyhook dataset is updated more regularly than OpenCellID, thus making it more likely to represent any changes in the network infrastructure.

3.2. Comparison of Coverage

Census block type Total census blocks Verizon T-Mobile AT&T Sprint
FCC Skyhook FCC Skyhook FCC Skyhook FCC Skyhook
Non-Tribal Rural 93,680 89% 77% 94% 86% 85% 79% 39% 49%
Non-Tribal Urban 41,872 100% 100% 100% 100% 99% 99% 96% 99%
Tribal Rural 30,588 93% 80% 92% 63% 78% 73% 27% 41%
Tribal Urban 2,469 100% 99% 95% 94% 93% 94% 75% 88%
All 168,609 93% 84% 95% 85% 88% 83% 52% 61%
Table 3. Percentage of total census blocks covered according to FCC and Skyhook.

3.2.1. Coverage comparison between the FCC and Skyhook

We first compare a coverage shapefile generated from Skyhook cell locations and estimated coverage ranges with the FCC map for each operator.

Block
type
Total
blocks
Verizon T-Mobile AT&T Sprint
Non-Tribal Rural 93,680 14,013 9,025 8,705 1,355
Non-Tribal Urban 41,872 0 0 213 25
Tribal Rural 30,588 5,109 9,150 3,004 230
Tribal Urban 2,469 4 14 4 0
Table 4. Number of census blocks where there is coverage according to FCC but no coverage according to Skyhook.

Methodology: We consider coverage at the census block level for this comparison. In addition to reporting coverage shapefiles, the FCC reports coverage at a census block level and considers a census block as covered if the centroid of the census block falls within a covered region (centroidMethodology). We generate a similar census block level coverage map per-operator using Skyhook’s estimated coverage. To do so, we first obtain the coverage shapefile for each operator using a cell’s estimated location and coverage radius. Then we use the FCC centroid methodology to generate the Skyhook LTE coverage map at the census block level. We use the Python GeoPandas 0.8.2 library for the associated spatial operations (geopandas). We group census blocks into four categories: Non-Tribal Urban, Non-Tribal Rural, Tribal Urban, and Tribal Rural. This is done to explore whether the degree of agreement of the two datasets varies across these dimensions. We use the U.S. Census Bureau’s classification of urban and rural blocks and its boundary definitions of tribal jurisdiction for this categorization (urbanRural)

. In this analysis we consider census blocks as tribal if they overlap with any tribal boundaries. We varied the tribal labeling schemes such as classifying a census block tribal if the centroid of the block is within a tribal boundary. However, the results remain qualitatively similar and do not impact the findings presented here.

Results: Table 3 shows the percentage of total census blocks covered by each cellular operator, according to the FCC and Skyhook data, broken down by census block type. Among the four operators, T-Mobile covers the greatest number of census blocks based on both FCC and Skyhook data, while Sprint covers the fewest. All four cellular operators have relatively higher coverage for both tribal and non-tribal urban census blocks. However, all operators except Verizon offer their lowest coverage in tribal rural areas. For some operators, the differences between non-tribal rural and tribal rural are as great as (based on Skyhook data) and (based on FCC data).

(a) Verizon
(b) Sprint
Figure 4. Comparison of LTE coverage maps of New Mexico. Yellow blocks are covered in the FCC map but not in Skyhook; purple blocks are covered in the Skyhook map but not the FCC. Green blocks are covered in both, and pink blocks are covered in neither.

The extent of LTE coverage differs between the two datasets. For three out of four providers, Skyhook shows lower coverage than the FCC, particularly in the rural census blocks. For instance, the FCC T-Mobile data shows coverage in of tribal rural blocks, whereas Skyhook shows coverage in only of such blocks. On the other hand, Skyhook shows a higher number of census blocks covered than the FCC for Sprint. The higher coverage in the case of Sprint could have been due to multiple reasons, including: (i) there are differences in the propagation models used by Skyhook and Sprint to estimate coverage with the former’s models being more generous than the latter’s, and (ii) the Skyhook data is collected across time and Sprint may have discontinued or temporarily disabled some of the cells, which is challenging to detect from the crowdsourced data.

Figure 4 visually compares the LTE coverage maps from the FCC and the Skyhook datasets for Verizon and Sprint. We more deeply examine the discrepancy mapped in yellow in Figure 3(a). Table 4 shows the number of census blocks where there is coverage according to the FCC but none according to Skyhook for each operator. Coverage claims in both tribal and non-tribal rural census blocks disagree the most. The number of such blocks are particularly high for Verizon ( overall) and T-Mobile ( overall). There are two possible reasons for this disagreement: network operators lack adequate infrastructure in rural areas, but tend to overestimate coverage while reporting it to FCC, or Skyhook is missing data points from rural census blocks where fewer people carry UEs. The latter case will lead to either some LTE cells not being detected or an inaccurate characterization of cell coverage due to fewer measurements.

Block type Verizon T-Mobile AT&T Sprint
Non-Tribal Rural 528 (1%) 2,575 (3%) 5,342 (6%) 19 (¡1%)
Non-Tribal Urban 0 (0%) 0 (0%) 213 (1%) 0 (0%)
Tribal Rural 2,655 (9%) 2,565 (8%) 2,166 (7%) 0 (0%)
Tribal Urban 0 (0%) 0 (0%) 4 (¡1%) 0 (0%)
Table 5. Number of census blocks with LTE coverage according to the FCC, but only 3G coverage according to Skyhook. The numbers in parenthesis report the same data as a percentage of total census blocks of the corresponding type.

To understand which of these potential reasons for disagreement is more likely, we check whether Skyhook shows 3G coverage for these census blocks (where the FCC reports LTE coverage but Skyhook does not). If Skyhook reports 3G coverage in these blocks, this suggests that users may have contributed to the Skyhook dataset in these census blocks, therefore LTE coverage would have been detected if it existed. Note that a more accurate way would have been to directly consider the location of end-user measurements connected using 3G technology and analyze whether they fall within LTE coverage areas in the FCC data. However, we did not have access to these end-user measurements due to Skyhook’s privacy policy. Instead, we consider the 3G coverage maps as a reasonable approximation for our analysis and generate a 3G coverage map at the census block level for these areas in the same manner as described previously for LTE. The number of census blocks that show only 3G coverage according to Skyhook is presented in Table 5. We observe a significant number of census blocks where Skyhook detects 3G coverage, indicating that the FCC LTE coverage claims may be overstated in these areas. The number of such blocks is greater for tribal rural areas (up to ), thus indicating a higher mismatch of the two datasets in tribal rural areas.

3.2.2. Active measurements compared to FCC and Skyhook coverage

In this section, we compare our own active measurements with the coverage maps from the FCC and Skyhook described in Section 3.2.1. We focus now on the geographic region around Santa Clara Pueblo, which lies north of Santa Fe (see Figure 2), a region with a mix of urban, rural, and tribal population blocks.

Methodology: We use the Service State readings collected in our measurements for this analysis (see Section 2.4). We also collected information about the connected cell’s technology (e.g. LTE) and the geolocation of the measurements. This information is used to infer whether LTE coverage exists at a location. We consider LTE to be available if the Service State shows IN_SERVICE to indicate an active connection, and if the associated cell is an LTE cell. We term this the active LTE coverage. We then compare the FCC and Skyhook coverage with the active LTE coverage to see whether the datasets agree. Note that we use the coverage shapefiles for both Skyhook and the FCC in this comparison instead of the census block centroid approach in Section 3.2.1. This allows us to compare coverage more precisely for a location, especially if a census block is only partially covered.

Active Total FCC Skyhook
NC C NC C
No Coverage (NC) 266 19% 81% 32% 68%
Coverage (C) 1,440 0% 100% 5% 95%
(a) Verizon
Active Total FCC Skyhook
NC C NC C
No Coverage (NC) 324 6% 94% 21% 79%
Coverage (C) 1,361 0% 100% 5% 95%
(b) T-Mobile
Active Total FCC Skyhook
NC C NC C
No Coverage (NC) 568 25% 75% 53% 48%
Coverage (C) 1,095 2% 98% 7% 93%
(c) AT&T
Active Total FCC Skyhook
NC C NC C
No Coverage (NC) 231 96% 4% 99% 2%
Coverage (C) 1,122 21% 79% 20% 80%
(d) Sprint
Table 6. Confusion matrices comparing active measurement coverage with FCC and Skyhook. Total denotes the number of active measurements in each category.

Results: Table 6 shows the confusion matrices that compare active LTE coverage with reported coverage from the FCC and Skyhook maps. Both maps show coverage at locations where our measurements did not. In the case of Verizon, of the measurements with no coverage are from locations reported as covered by the FCC. This over-reporting is lowest for Sprint and highest for T-Mobile.

We also observe significant disagreement (up to ) between Skyhook coverage and our measurements. Two possibilities may cause this: i) paucity in Skyhook UE signal strength readings available for cell location and coverage radius estimation, or ii) error in the cell propagation model itself possibly due to variations in the environment conditions such as the terrain. In either case, Skyhook agrees better with our measurements than the FCC in reporting areas with no LTE coverage. E.g., in the case of AT&T, of our measurements with no LTE coverage belong to areas reported as covered by the FCC as compared to by Skyhook.

4. Recommendations

In this section, we discuss some of the implications of our experience collecting and analyzing coverage data, recommendations based on our findings, and directions for future work.

Recommendations for the FCC: Our findings make a case for including mechanisms that validate ISP-reported coverage data, especially in rural and tribal regions. Given the scale of cellular networks, crowdsourcing coverage measurements is a viable approach to validate access as opposed to controlled measurements. Within crowdsourcing, we suggest leveraging incidental rather than voluntary approaches, possibly working with third-party services that collect network measurements as part of their service process (as in the case of Skyhook).

In addition, crowdsourcing alone may not be sufficient for determining coverage in some cases. Even with the more complete datasets provided through incidental crowdsourcing, rural areas tended to receive significantly fewer measurements per tower. In such cases, mechanisms need to be developed to precisely determine areas of greatest disagreement using sparse crowdsourced datasets. Resources can then be focused to target data collection in these areas instead of a blanket approach measuring coverage everywhere.

Recommendations for crowdsourced data collection: We find some shortcomings in the existing crowdsourced datasets. First, existing datasets only report areas with positive coverage, i.e., areas where coverage is observed. This makes it difficult to distinguish areas that lack coverage from areas for which no measurements were gathered. Recording areas that lack a usable signal can enable more stronger conclusions from crowdsourced data.

Second, we note that even crowdsourced datasets are prone to overestimation of coverage potentially due to errors in cell location and coverage estimation. Research efforts that effectively utilize the knowledge of cellular network design are needed for an accurate characterization of coverage from crowdsourced measurements. For instance, existing cell location estimation techniques localize cells independently (see Section 2.3) and are prone to errors when there are few end-user measurements (li2017identifying). Instead, one can utilize the fact that a single physical tower in an LTE network hosts multiple cells. Thus, algorithms that jointly localize cells for whom the end-user measurements are in physical proximity may provide higher accuracy even with fewer end-user measurements. Similarly, alternate data sources can also be considered for localizing cell infrastructure such as using geo-imagery data to identify physical towers or directly obtaining infrastructure data from entities that build and manage physical cell towers (usually different from cellular ISPs).

Measuring access beyond binary coverage: While the focus of this work is on understanding coverage, we recognize that a binary notion of coverage alone does not necessarily indicate the existence of usable LTE connectivity. Various other factors can impact end-user experience in a “covered” area such as low signal strength or poor middle-mile connectivity. Thus, future coverage measurement efforts need to augment coverage reports with measurements of performance to provide models that are more aligned with user experiences. Measuring such performance metrics poses a greater challenge because end-user experience depends on a myriad of factors beyond just last-mile link quality. We believe that efforts that lead to increased community awareness (e.g., workshops in public libraries, community meetings) on the importance of measuring mobile coverage is the way to tackle this problem.

Finally, we also note that access and adoption are different and there are issues beyond access that might also warrant measurement and consideration as accountability measures for operators. Our collection of ground truth data sets involved five days driving through Rio Arriba County in northern New Mexico. In preparation for the trip, we worked to obtain SIM cards that would enable us to access the networks of the four major U.S. LTE operators. This was surprisingly difficult; over the course of a month leading up to the measurement campaign, we spent a collective 24 hours in various operator kiosks and stores in three states in order to obtain four SIM cards (one for each major operator). At one of the stores in Santa Fe, we encountered a woman who had to drive an hour from Las Vegas, NM to address some of the issues she was having with her mobile service operator that were preventing her from using her data plan. While these anecdotal experiences mirror the qualitative claims of coverage overestimation, they do introduce a new set of issues that need to be taken into account to effectively reduce the barriers of Internet access for rural communities.

5. Conclusion

In this paper, we quantitatively examine the LTE coverage disagreement among existing datasets collected using different methodologies. We find that existing datasets display the most divergence when compared with each other in rural and tribal areas. We discuss our findings with respect to their implications for telecommunications policy. We also identify several future research directions for the computing community, including: mechanisms to augment existing datasets to precisely determine areas where more concerted measurement efforts are needed, improved coverage estimation models especially for areas with a lower density of crowdsourced measurements, and accurate and scalable measurement of access beyond a binary notion of coverage.

Acknowledgements

This work is funded in part by National Science Foundation Smart and Connected Communities grant NSF-1831698.

References