Discovering the IPv6 Network Periphery

01/23/2020 ∙ by Erik C. Rye, et al. ∙ Naval Postgraduate School 0

We consider the problem of discovering the IPv6 network periphery, i.e., the last hop router connecting endhosts in the IPv6 Internet. Finding the IPv6 periphery using active probing is challenging due to the IPv6 address space size, wide variety of provider addressing and subnetting schemes, and incomplete topology traces. As such, existing topology mapping systems can miss the large footprint of the IPv6 periphery, disadvantaging applications ranging from IPv6 census studies to geolocation and network resilience. We introduce "edgy," an approach to explicitly discover the IPv6 network periphery, and use it to find > 64M IPv6 periphery router addresses and > 87M links to these last hops – several orders of magnitude more than in currently available IPv6 topologies. Further, only 0.2 existing IPv6 hitlists.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Among the unique properties inherent to IPv6’s large address space size are ephemeral and dynamic addressing, allocation sparsity and diversity, and a lack of address translation. These well-known properties complicate efforts to map the infrastructure topology of the IPv6 Internet. While previous research has tackled problems of target selection, speed, and response rate-limiting in active IPv6 topology probing (imc18beholder, ), the IPv6 peripheryedge – last hop routed infrastructure connecting end hosts – is challenging to discover, and difficult to discern.

Discovery of the IPv6 peripheryedge is important not only to the completeness of network topology mapping, but provides a crucial supporting basis for many applications. For instance, IPv6 adoption (Czyz:2014:MIA:2740070.2626295, ; Zander:2018:WYI:3185332.3158374, ; pujol2017understanding, ), census (Plonka:2015:TSC:2815675.2815678, ), and reliability and outage studies (Luckie:2017:IRO:3098822.3098858, ) all depend in part on a complete and accurate map of the IPv6 topology inclusive of the peripheryedge, while understanding provider address allocation policies and utilization also requires completeness (Foremski:2016:EUS:2987443.2987445, ; v6exhaust-ic16, ). Similarly, work on IPv4 to IPv6 network congruence (Dhamdhere:2012:MDI:2398776.2398832, ; livadariu2015leveraging, ) and and IPv6 geolocation (imc13dns, ), can utilize IPv6 topologies. Further, our work illuminates potential security and privacy vulnerabilities inherent in the way today’s IPv6 peripheryedge is deployed (czyz2016, ; rye2019eui64, ).

We present “edgy,” a new technique to explicitly discover the IPv6 peripheryedge. In contrast to IPv6 scanning (Murdock:2017:TGI:3131365.3131405, ; rfc7707, ), passive collection (Plonka:2015:TSC:2815675.2815678, ), or hitlists (Gasser:2018:CEU:3278532.3278564, ), which, by construction, target endhosts, edgy is specifically designed to find last hop routers and subnetworks in the IPv6 Internet. Our contributions include:

  1. Edgy, an algorithm to discover, identify, and enumerate the IPv6 peripheryedge.

  2. Active measurement using edgy to find 64.8M last hop router addresses and 87.1M edges to these last hops from a single vantage.

  3. Discovery of peripheryedge addresses that are 99.8% disjoint from current IPv6 hitlists (Gasser:2018:CEU:3278532.3278564, ) and orders of magnitude larger than existing IPv6 topology snapshots (caida-topov6, ), suggesting that edgy is complementary to these prior approaches.

  4. Discovery of 16M EUI-64 last hop addresses, suggesting a potential vulnerability to security and privacy.

2. Background and Related Work

In this work, we define the “peripheryedge” not to be servers or network endhosts, but rather the last hop router connecting endhosts. Whereas significant prior work has developed techniques for IPv6 endhost discovery (Murdock:2017:TGI:3131365.3131405, ; rfc7707, ; Gasser:2018:CEU:3278532.3278564, ), comparatively little work has explored the IPv6 peripheryedge.

The large address space in IPv6 removes the need for address translation, thus while many IPv4 hosts are connected via NATs (rfc2663, ), the IPv6 peripheryedge typically extends into customer premises. Indeed, in IPv6, the Customer Premises Equipment (CPE) is a router, implying that in conjunction with the rapid increase in IPv6 adoption (Czyz:2014:MIA:2740070.2626295, ; Zander:2018:WYI:3185332.3158374, ), the IPv6 peripheryedge is considerably larger than in IPv4, especially for residential networks.

Figure 1. Common IPv6 architecture: an IPv6 subnet is assigned to the link between the provider and last hop CPE routers. There is no NAT or private addressing; a separate distinct routed IPv6 subnet is assigned to devices attached to the last hop CPE.

Figure 1 shows an example of the IPv6 peripheryedge we attempt to discover. Here, the point-to-point subnet between the provider and the CPE is assigned a public IPv6 prefix, while the subnet on the other side of the CPE (e.g., in the customer’s home) is also a publicly-routed prefix. While this example shows a common residential IPv6 architecture, similar designs exist for the enterprise peripheryedge.

Consider an IPv6 traceroute to a random address within a provider’s globally advertised BGP prefix, such as is routinely performed by existing production topology mapping systems (caida-ark, ). The traceroute (Figure 2): i) is unlikely to hit the prefix allocated to a customer CPE or her network; ii) is even less likely to reach a host within the customer’s network; and iii) does not illuminate the scope, characteristics, or breadth of subnets within the prefix. When a traceroute does not reach its target destination it is ambiguous: does the last responsive hop belong to the core of the network, or the peripheryedge?

Passive techniques suffer similar problems in revealing the network peripheryedge. For instance, BGP, by design aggregates routes such that the aggregate visible in a looking glass does not reveal the subnets within. And, while there was been significant prior work in characterizing the IPv6 address space, these primarily focus on endhosts. For example, Plonka and Berger examine and analyze the addresses and behaviors of IPv6 clients connecting to a large CDN (Plonka:2015:TSC:2815675.2815678, ). However, this passive collection of client requests alone does not reveal the network peripheryedge on the path to those clients.

traceroute to 2a03:4980:2b6:9624:8643:b70f:adae:4f40
    . . .
 5  2001:7f8:1::a502:4904:1  16.862 ms
 6  2a03:4980::6:0:2  25.948 ms
 7  2a03:4980::b:0:5  39.560 ms
 8  *
 9  *
 
Figure 2. Randomly chosen trace targets (where no host exists) are unlikely to discover subnets within a prefix, or to elicit a response. It is thus ambiguous whether hop 7 is a periphery addressan edge in this example, even though the trace reaches into the destination /32.

3. Methodology

Our work seeks to perform active probing in a way that elicits responses from the last hop IPv6 peripheryedge, rather than network core infrastructure, servers or other endhosts. Enumerating last hop router addresses (e.g., CPE) and inferring networks beyond the last hops are the principle goals of edgy.

Edgy is divided into an initialization stage, followed by active probing that proceeds in rounds. Results from one round of probing are used to inform the probing in subsequent rounds. This section describes edgy; the complete algorithm is given in Appendix A.

3.1. Edgy

Because of the massive size of the IPv6 address space, edgy relies on an input set of “seed traces” to focus and guide its discovery. Thus, the ability of edgy to discover the network peripheryedge depends strongly on the input seed traces it uses. These seed traces could come from anywhere; in §3.2 we describe two specific realistic seed traces we utilize: i) BGP guided; and ii) hitlist guided.

Algorithm 1 describes edgy’s initialization stage. Edgy iterates through the input seed and examines the last responsive hop in each trace. It maintains the set of targets that, when used as the traceroute destination, had a given last hop. Edgy then finds unique last hops– those that were only discovered by probing to destinations within a single /48 prefix. The intuition is to find candidate /48 prefixes that are likely to be subnetted, and hence contain peripheryedge routers. By contrast, if there are two or more probes to different /48s that elicit the same last hop, those /48s are less likely to be subnetted.

These candidate target /48 prefixes are fed to Algorithm 2. Algorithm 2 probes targets within the input prefixes at progressively finer granularities until a stopping condition (a discovery metric ) is reached. A random Interface IDentifier (IID) (the 64 least significant bits in an IPv6 address) for each target subnet is used as the trace destination. Figure 4 depicts an illustration of edgy’s first round behavior running against an example /48 belonging to Cox.

The first subnet discovery round probes different /56 prefixes and serves as a coarse filter to determine which candidate /48s exhibit an appreciable amount of subnetting and merit further probing. /56s were chosen as the first stage because (bcop-prefix, ) recommends them as a potential subnet size to allocate to residential customers; therefore, if a /48 is allocated entirely to residential customers with /56s, the initial probing round should discover them all. We note, however, that these prefix delegation boundaries are not mandatory, and it is impossible to know a priori what prefix delegation strategy a provider has chosen. If the number of distinct last hops found during a probing round exceeds the threshold , we further subdivide responsive prefixes for additional probing in the next round. Figure 4 depicts an illustration of edgy’s second round behavior, again for the Cox /48.

It has been shown that the IPv6 Internet contains aliased networks, where every address within a prefix is responsive despite no actual host being present at that address. We remove any last hops that are equal to the probe target, as well as remove networks and addresses known to be aliases in the publicly curated list from Gasser et al. (Gasser:2018:CEU:3278532.3278564, ). In addition, we remove replies from non-routable prefixes; we observe site- and link-local addresses that fall into this category, as well as IPv4-in-IPv6 addresses and replies that appear to be spoofed.

Note that, during testing, we initially explored more rudimentary peripheryedge discovery mechanisms. For instance, intuitively, a binary-tree discovery process that bisects prefixes and probes each half would programmatically explore subnets. Unfortunately, such a straightforward approach performs poorly as providers do not allocate subnets uniformly. In this case, a core router can falsely appear as the common last hop for destinations in a common prefix, even when significant subnetting is present.

Figure 3. Edgy sends probes to each /56 in a target /48 in the first round. Green represents /64s, yellow /60s, and red /56s allocated in a Cox /48 prefix.
Figure 4. In the second round, probes are sent to each /60. New addresses are discovered in the upper half of this address space, but not in the lower half.

3.2. Edgy Input

Edgy takes as input a seed set of traces. These seed traces are created from running traceroutes to corresponding seed targets. We consider two realistic potential seed target lists: BGP-informed and hitlist-informed. The BGP-informed targets assume no prior knowledge other than global BGP advertisements. Since BGP routes are readily available from looking glasses, this scenario is easily replicated by anyone and models what CAIDA uses to inform their probing. In our experiments, we utilize publicly available BGP-informed seed traces collected as part of an August, 2018 effort to uniformly probe every /48 in the IPv6 Internet (v6exhaust-ic16, ; caida-routed48, ). Herein, we term this trace seed as the passive seed.

Second, we consider a target set informed by prior knowledge in the form of passive traces, server logs, or hitlists. In our experiments, we utilize a publicly available IPv6 hitlist (Gasser:2018:CEU:3278532.3278564, ) that was used to generate a seed set of hitlist-informed traces (imc18beholder, ). Herein, we term this trace seed the composite seed.

3.3. Limitations

There are several potential complications that edgy may encounter, and corresponding limitations of our approach and evaluation. First, during probing, we depend on receiving a response from the penultimate traceroute hop along the data path to a destination. However, the last responsive hop may instead be a different router due to filtering, loss, or rate-limiting, i.e., if the last hop remains anonymous. This case does not cause false inferences of peripheryedge addresses, but instead causes edgy to terminate probing of a prefix prematurely.

Second, we do not have ground-truth in order to determine whether the peripheryedge we discover isare indeed the last hop before a destination endhost. While various, and at times conflicting, guidance exists regarding the size of delegated prefixes (rfc3177, ; rfc6177, ; rfc7381, )However, as /64 prefixes are (historically) recommended and used as individual customer IPv6 prefixes (rfc3177, ; rfc6177, ), discovery of unique /64s is strongly indicative of discovering the peripheryan edge. Additionally, the peripheryedge addresses we find are frequently formed using EUI-64 addresses where we can infer the device type based on the encoded MAC address (see §4.5). These MAC addresses specifically point to CPE. Further, we examine several metrics of “edginess” to better understand the results in §4.3.

3.4. Probing

Probing consists of sending TTL-limited ICMP6 packets; we used the high-speed randomized yarrp topology prober (imc16yarrp, ) due to the large number of traces required during edgy’s exploration, as well as to minimize the potential for ICMP6 rate limiting (which is mandated and common in IPv6 (imc18beholder, )).

We use ICMP6 probes as these packets are designed for diagnostics and therefore are less intrusive than UDP probes. Further, we send at a conservative rate while yarrp, by design, randomizes its probing in order to minimize network impact. Last, we follow best established practices for performing active topology probing: we coordinated with the network administrators of the vantage point prior to our experiments and hosted an informative web page on the vantage point itself describing the experiment and providing opt-out instructions. We received no opt-out requests during this work.

4. Results

In Sept. and Oct. 2019, we ran edgy from a well-connected vantage point in Europe. Edgy used yarrp version 0.6 at less than 10,000pps with the neighborhood TTL setting to reduce load on routers within five hops of the vantage point.

4.1. Passive Seed Results

Initializing edgy with the passive seed data yielded 130,447 candidate /48 prefixes. Following Algorithm 2, edgy traces to a random IID in each of the 256 constituent /56 subnets in each /48s for a total of 33,394,432 distinct traces.

The first probing round discovers 4.6M unique, non-aliased last hop IPv6 addresses residing in 33,831 distinct /48 prefixes (Table 1) by probing to targets in 130,447 /48 prefixes. Often, the last hop address is not in the target prefix itself but in a different prefix belonging to the same AS, and many target prefixes have last hops in the same /48. The density of discovered last hop addresses across target prefixes is non-uniform: nearly 75% of the targeted /48 prefixes discover 16 or fewer distinct last hops. The prefixes in which the last hops reside is also highly non-uniform. Of the 33,831 /48s in which last hop addresses reside, 11,064 were responsible for only a single last hop address. This is likely indicative of a /48 allocation to an end site. On the other end of the spectrum, a single /48 (2001:1970:4000::/48) contained over 200,000 unique last hop addresses. 2001:1970:4000::/48 was as the last hop prefix in traces to 1,008 distinct /48 target prefixes, the most extreme example of many target /48s mapping to a single last hop prefix.

Because a /48 prefix entirely subnetted into /52s should exhibit 16 distinct last hops, we choose empirically as a baseline indication in the first probing round that more granular subnetting than strictly into /52s is in place.

Passive Composite
Round Prefixes Probed Unique Last Hops Unique Last Hop /48s Cum. Unique Last Hops Prefixes Probed Unique Last Hops Unique Last Hop /48s Cum. Unique Last Hops
1 (/56) 130,447 4,619,692 33,831 4,619,692 111,670 9,217,137 89,269 9,217,137
2 (/60) 34,520 12,228,916 26,082 13,410,601 67,107 11,021,329 74,302 11,365,910
3 (/62) 12,014 14,770,061 11,675 24,832,391 4,462 5,428,992 19,942 15,569,221
4 (/64) 2,641 15,326,298 7,833 37,169,357 1,531 15,340,591 32,718 29,248,703
Table 1. Passive and Composite Seed Routable Address Discovery by Round

In the second discovery round, we probe to each /60 within each of the target prefixes, sending 4,096 probes to each of 34,520 /48s. Edgy discovers significantly more unique last hop addresses in the second round, as the probing is focused on known address-producing target subnetworks identified in the first round of probing. We find 12.2M non-aliased, unique last hop addresses, which are then used to generate the third round targets. We select target prefixes that produce more than distinct last hop addresses because they serve as a potential indicator that subnetting smaller than /56 networks may be present in the target /48 network. We find 12,014 /48s that meet this criteria, and generate targets in each /62 network within each of the target /48s, using a random IID. This produces 196.8M individual target addresses.

The /62 probing round, while probing % of the original target prefixes, is focused on those with fine-grained subnetting and helps to infer the provider’s subnetting strategy. As the IETF now discourages, but does not forbid, /64 or more-specific subnettingsubnets more specific than /64s (rfc6177, ), we are interested in non-conformant delegations but must balance inferring this behavior with probing load. Because subnetting generally occurs on nybble boundaries (rfc6177, ), by probing /62s, we are able to detect when target prefixes are subnetted beyond /60s, which is an indication that perhaps the operator is allocating /64 subnets. The /62 probing round produced 14.7M unique last hop addresses, which serve as input to the final round.

The final round is designed to enumerate last hop addresses for /64 subnets in target /48 networks.discover target /48s that contain /64 subnets. Edgy selects any prefix with prefix-unique last hops within a /60 (because we probe each /62, each /60 contains four targets). We surmise that four prefix-unique last hops is an indication that either the operator subnets at the /62 level, or is assigning /64 networks as the interior network to their customers in violation of best current practice. The final /64 probing round discovers 15.3M distinct IPv6 addresses through exhaustive probing of 2,641 /48 target prefixes.

Cumulatively, probing targets using the passive seed data discovers more than 37M distinct IPv6 last hop router addresses across all four active probing rounds. Table 1 quantifies the address discovery during each probing round. 52,301 Autonomous Systems (ASes) are represented in the last hop addresses, corresponding to 143 countries, as reported by Team Cymru’s IP to ASN service (cymru2008ip, ). Figures 6 and 6 summarize the ASes and countries that produced the largest number of peripheryedge last hop addresses.

Figure 5. Top 10 Address Producing Autonomous Systems
Figure 6. Top 10 Address Producing Countries

4.2. Composite Seed Results

We replicate the experiment described in §4.1 seeded with the composite seed traces (from (imc18beholder, )). Table 1 shows the per-round results for both the passive and composite seeds. Algorithm 1 on this input seed yielded 111,670 target /48 prefixes, about 20k fewer than the passive seed. However, the initial /56 probing round discovered nearly twice as many unique last hop addresses as the passive seed data. The composite seed led to almost double the number of target prefixes in the /60 round as compared to the passive seed, but discovered nearly 1M fewer last hops. As a result, only 4,462 /48 target prefixes were probed in the /62 probing round, discovering 5.4M last hops from 19,942 /48 prefixes. Finally, 1,531 target /48s were exhaustively probed in each /64 in the final probing round, about 1% of the original /48 prefixes selected from the seed data. The /64 probing round discovered over 15M unique last hops, indicating that the 1,500 target /48s each contributed about 10,000 unique addresses on average. We attribute the differences between the passive and composite seed data results to differences in how the original source data was collected. For example, the passive seed data was derived from a uniform sweep of the advertised IPv6 space, while the composite seed data derived from a measurement campaign aimed at networks known to be dense in customers.

In total, the composite seed data probing discovers over 29M unique last hop router addresses over the course of the experiment. Nearly half of those addresses are found in the /64 probing round, during which edgy exhaustively probes all of the /64s in 1,531 /48 target prefixes. This suggests that a small number of prefixes have fine-grained subnetting, and that substantial peripheryedge topology can be gained by probing a carefully selected set of target prefixes. Figures 6 and 6 display the top ten ASes and countries from which we obtain last hops for both the passive and composite seed experiments. 141 countries contribute to the total, as do 3,578 unique ASNs.

4.3. Edginess Metrics

To better understand the extent to which edgy discovers IPv6 peripheryedge infrastructure, we introduce threetwo metrics of “edginess.” The first coarse metric is simply the fraction of traces with a last hop within the same Autonomous System (AS) as the target destination. Clearly, this condition does not imply that the last hop is truly an interface of the peripheryedge router. However, it provides a rudimentary measure of whether traces are making it into the target network’s AS. In contrast, a trace to a non-existent network will be dropped at an earlier hop in a default-free network.

We compare edgy’s results against a day’s worth of CAIDA’s IPv6 Ark traceroute results from 105 different vantage points on Oct 1, 2019 (caida-topov6, ). Across nearly 17M traceroutes performed on that day, 1.7M (10%) produced a response from the target destination. However, of those 1.7M traceroutes that reached the destination, 86.2% were from probing the ::1 address, while 13.3% came from destinations known to be aliased, i.e., a fake reply. Unsurprisingly, less than 0.5% of the probes to random targets reached the destinationdestinatino.

40.2% of the CAIDA traces elicit a response from a last hop address that belongs to a BGP prefix originated by the same AS as the target address. In contrast, 87.1% of edgy’s traces reach the target AS. While these results cannot be directly compared – edgy performs an order of magnitude more traces than CAIDA – it does demonstrate that the probing performed by edgy is in fact largely reaching the target network, if not the peripheryedge.

Our second edginess metric is a more granular measure of how deep into the target network, and hence how close to the peripheryedge, traces traverse. For each trace, we find the number of most significant bits (MSBs) that match between the target and the last hop response, i.e., the netmask of the most specific IPv6 prefix that encompasses the target and last hop. As before, this metric does not provide a definitive measure of reaching the peripheryedge. Indeed, we empirically observe many networks that use very different IPv6 prefixes for the last hop point-to-point subnetwork as compared to the customer’s prefix. However, the basis of this metric is that hierarchical routing implies more matching MSBs the closer the trace gets to the target.

Figure 7. Size of prefix encompassing both target and last hop IPv6 addresses

Figure 7 shows the distribution of matching bits across the traceroutes from both CAIDA and edgy. Whereas the median size of the matching prefix is a /13 for CAIDA, it is nearly a /32 for edgy. The target and last hop share the same /48 for more than 5% of the edgy traces, but just 2% of the CAIDA traces. Thus, again, we see edgy’s probing reaching more of the network peripheryedge.

Finally, we seek to determine how many of our newly-discovered addresses are indeed periphery addresses, and therefore do not appear as an intermediate hop in traceroutes to other target addresses. In the passive seed probing first round, 0.9% of discovered last hop addresses to a target appear as an intermediate hop to another target. In the second round, the same is true of 21% of last hops, 23% in the third round, and 4% in the fourth probing round. However, closer examination indicates that these numbers are skewed by providers that cycle periphery prefixes. For example, in the second round, 1.6M of the 2.5M addresses seen both as a last and an intermediate hop are located in ASN8881, which we observe cycling customer prefixes on a daily basis. This often causes traces to appear to “bounce” between two (or more) different addresses at relatively high hop counts. Sorting by the time the response was received shows that a single IPv6 address was responsible for high hop count responses until after a distinct point at which a second address becomes responsive. This erroneously causes the address that was not responsible for the highest hop count response to appear as if it is an intermediate hop for a target.

We also observe a second class of IPv6 address that appears both as a last hop address and as an intermediate address in traces to other targets. These addresses appear as the last hop for a large number of target networks that are most likely unallocated by the provider; these addresses typically have low entropy IID (e.g.,::1 or ::2) and are likely to be provider infrastructure. These last hop addresses also appear on the path to addresses that appear to be CPE, based on the high entropy or EUI-64 last hop returned when they are an intermediate hop.

4.4. Consolidated Results and Seed Data Comparison

Although both probing campaigns began with approximately the same number of target /48 prefixes in the first probing round (130,447 and 111,670 in the passive and composite seeds, respectively), only 9,684 of these /48s are common between the two data sets. The number of common target prefixes decreases at each round, reaching 177 in the final /64 probing round. Only 1.6M (2.5%) last hop IPv6 addresses are present in both data sets. These results show that edgy is sensitive to seed input, and suggest that using additional seed data sources may prove fruitful.

Of the top ten ASNs, only four are common between the two data sets – ASNs 852, 8881, 45899, and 45609. Of the top ten countries, however, six are common: Germany, Vietnam, Canada, Brazil, India, and Japan, with Germany ranking first in both. While the US is the second-leading producer of last hop addresses in the passive seed data with 6.9M unique last hops, it is fourteenth in the composite data with only 357,877 addresses. Last, we consider AS type owning the last hops using CAIDA’s AS type classification (caida-classification, ). By this classification, edgy’s results come overwhelmingly from transit/access networks (99.9%) rather than content or enterprise ASes. This matches our intent for edgy to focus on IPv6 peripheryedge discovery.

4.5. EUI-64 Addresses

Previous studies, e.g., (Gasser:2018:CEU:3278532.3278564, ; imc18beholder, ) identified the presence of many EUI-64 addresses in IPv6 traceroutes, where the host identifier in the IPv6 address is a deterministic function of the interface’s Media Access Control (MAC) address. Our study similarly found a significant fraction of EUI-64 addresses, despite the introduction of privacy extensions for Stateless Address Autoconfiguration (SLAAC) addresses in 2007 (narten2007privacy, ). We discover slightly more than 16M EUI-64 last hop addresses, identifiable from the ff:fe at byte positions 4 and 5 in an IID, using the passive seed data, or approximately 42% of the total last hops. However, only 5.4M (34%) of the MAC addresses in these 16M last hops are unique. Figure 9 displays the CDF of number of appearances MAC addresses make in unique EUI-64 last hop addresses. 65% of the MAC addresses appear in only one EUI-64 IPv6 address; 30% appear in two to ten last hop addresses, and three MACs appear in more than 10,000 different last hops.

The discrepancy between unique EUI-64 last hop addresses and MAC addresses appears to have two root causes. The first is provider prefix rotation. Although 3.5M of the 5.4M unique MAC addresses observed appear in only one last hop address, 1.9M appear multiple times. Of these, the vast majority appear in only several addresses in the same /48, suggesting that the provider periodically rotates the remaining 16 bits of the network address portion (zwangstrennung, ). We observe some providers rotating the prefix delegated to their customers on a daily basis, and further examination of forced prefix cycling is a topic of future work. The second cause behind the disparity between number of MAC addresses and EUI-64 last hop addresses is due to what we believe is MAC address reuse.

For instance, the MAC address 58A:02:03:04:05:06 occurs in more than 266k passive seed last hop addresses in 76 /48s allocated to providers throughout Asia and Africa. Because our probing took place over a period of several weeks, we believe it is unlikely that a combination of provider prefix rotation and mobility contributed to substantial number of devices containing this MAC address; its simple incremental pattern in bytes 2 through 6 further suggest it is likely a hard-coded MAC address assigned to every model of a certain device. Support forums indicate that some models of Huawei LTE router (dt-huawei, ; ru-huawei, ) use 58:02:03:04:05 as an arbitrary MAC address for their LTE WAN interface.

Figure 8. MAC Address Frequency by Source
Figure 9. IID Entropies by Data Source

4.6. Comparison to the IPv6 Hitlist Service

We compare our results to an open-source, frequently updated hitlist 

(Gasser:2018:CEU:3278532.3278564, ). In mid-October 2019, the hitlist provides approximately 3.2M addresses responsive to ICMPv6, and TCP and UDP probes on ports 80 and 443.

Both the structure and magnitude of the addresses we discover separate our work from (Gasser:2018:CEU:3278532.3278564, ), which is unsurprising given our focus on finding addresses at the network peripheryedge. Unlike our results, the addresses in the hitlist are less likely to be EUI-64 addresses. Only 338,000 EUI-64 addresses appear in the hitlist, representing approximately 10% of the total responsive addresses. Figure 9 plots the normalized Shannon entropies of the IIDs of addresses in our datasets compared with addresses in the IPv6 hitlist service. We see that the IPv6 hitlist contains a far greater proportion of low-entropy IIDs addresses than the last hop addresses edgy discovers. As peripheryedge devices, particularly CPE in residential ISPs, are unlikely to be statically assigned a small constant IID and instead generate a high-entropy address via SLAAC, this reinforces edgy’s discovery of an entirely different portion of the IPv6 Internet than prior work. Further emphasizing the complementary nature of edgy’s probing, only 0.2% of the addresses we discover appear in this hitlist, indicating that edgy discovers different topology. Finally, while the last hops edgy discovers overwhelmingly (99.9%) reside in access networks (§4.4

), CAIDA’s AS-type classifier categorizes 1.8M of the hitlist’s IPv6 addresses as residing in access/transit networks, 1.2M in content networks, and 48,000 in enterprise networks.

4.7. Comparison with CAIDA IPv6 Topology Mapping

We again examine a day’s worth of CAIDA’s IPv6 Ark traceroute results from 105 different vantage points on Oct 1, 2019 (caida-topov6, ), in order to understand edgy’s complementary value. Because edgy sends two orders of magnitudean order of magnitude more probes (544M vs 8.5M), these are not directly comparable; however, we note that edgy discovers 64.8M non-aliased, routable last hop addresses that CAIDA’s probing does not. Indeed, CAIDA finds only 163,952 unique, non-aliased, routable last hop addresses. However, despite focusing on only target networks that are dense in last hops, edgy still discovers of the last hop addresses that CAIDA does. Edgy similarly finds 87.1M links to the last hop address that CAIDA does not, but discovers 54,024 of the 365,822 edges that contain only routable addresses from CAIDA’s probing. Edgy’s discovery of 37M unique peripheryedge last hops from 544M targets probed in the passive seed yields 0.068 unique last hops per target; while the Oct 1, 2019 Ark traceroutes discover only 0.019 unique last hops per target.

Of note, the disparity between the number of last hop edges and last hops themselves is primarily due to addressing dynamics. While at first glance the larger number of edges might suggest multihoming, because we believe our discovered last hops reside primarily on residential CPE, we consider this scenario unlikely. Considering all last hop edges, the data reveals 17.1M unique second-to-last hop IPv6 addresses, with 59.3M last hops. This indicates that some second-to-last hops are responsible for a large number of last hops, and indeed, while 99.9% of the unique last hops have ten or fewer associated last hops, 14,593 second-to-last hops are part of the edge to more than 100 last hops. Thirty-four second-to-last hops appear in an edge to over 100,000 last hops, seventeen of which are AS8881, which we note in (anon) exhibits a high degree of prefix rotation among its hosts.

4.8. Comparison with Seed Data Source

Edgy, by design, extends topology discovery methodologies and is complementary to existing topology mapping campaigns. However, because we believe edgy provides increased address discovery over existing mapping systems, we compare the results obtained with edgy to the trace seeds used as input to edgy.

The passive seed source consists of traces conducted in August, 2018 to every /48 in the routed IPv6 Internet conducted from CAIDA’s Archipelago (caida-routed48, ). These traces to 711M unique targets produce 5.8M unique last edges and 5.4M unique last hops after removing non-routable addresses. By contrast, edgy discovers 59.5M unique final edges and 37.1M unique IPv6 last hops by probing to 545M targets when seeded with the passive data. Thus, edgy significantly expands the discovered topology of an input seed.

Likewise, edgy discovers significantly more last hop addresses and edges than the composite seed. The composite seed discovers 434,560 unique last hops and 656,849 unique final edges, while edgy, informed by this data, discovers 29.2M unique last hops and 32.0M final edges.

5. Conclusions and Future Work

We introduce edgy, an algorithm to discover previously unknown portions of the IPv6 Internet, namely, the IPv6 peripheryedge. Edgy extends and augments existing IPv6 discovery mapping systems, and the last hop peripheryedge addresses that it discovers are nearly entirely disjoint from previous topology mapping campaigns. Because of privacy concerns involved with EUI-64 addresses and the ephemeral nature of many addresses, we are choosing not to release the periphery addresses edgy discovers; however, similar results will be reproducible given suitable seed data.

Several topics are planned for future work. First, we observe some service providers that cycle their customers’ peripheryedge prefix with a frequency on the order of days. This leads to high levels of address discovery for these providers, but, based on examining IID reuse, overcounts the number of actual device interfaces present. We seek to i) discover which networks implement high-frequency prefix rotation, ii) quantify the rates at which new prefixes are issued, and iii) determine whether the prefix issuing mechanism is deterministic or predictable. Second, large numbers of EUI-64 IPv6 addresses are discovered more than a decade after the introduction of privacy extensions for SLAAC (narten2007privacy, ). Because edgy discovers peripheryedge devices like CPE, quantifying device types present in networks may be possible by cross-referencing the models service providers issue to new customers, and through correlations with protocols that leak model information (acsac16furious, ). Third, we plan to improve edgy’s efficiency by training it with historical data. For instance, peripheryedge networks that exhibit frequent customer prefix cycling may need to be probed on a regular basis, while those that exhibit more stable last hop addresses may be re-probed relatively infrequently to detect additions and topological changes. Finally, because of the ephemeral nature of some of the addresses we discover, we intend to couple other measurements tightly with address discovery. For example, in order to further elucidate these addresses’ value, sending ICMP Echo Requests and banner grabs where applicable immediately after receiving new responses.

Acknowledgments

We thank Jeremy Martin, Thomas Krenc, and Ricky Mok for early feedback, John Heidemann for shepherding, Mike Monahan and Will van Gulik for measurement infrastructure, and the anonymous reviewers for insightful critique. This work supported in part by NSF grant CNS-1855614. Views and conclusions are those of the authors and should not be interpreted as representing the official policies or position of the U.S. government or the NSF.

References

  • (1) Zwangstrennung (forced ip address change) (2018), https://de.wikipedia.org/wiki/Zwangstrennung
  • (2) Huawei lte cpe b315 (mts 8212ft) - discussion (2019), http://4pda.ru/forum/index.php?showtopic=700481&st=3580
  • (3) The CAIDA UCSD AS Classification Dataset (2019), http://www.caida.org/data/as-classification
  • (4) Speedport ii lte router status (2020), https://telekomhilft.telekom.de/riokc95758/attachments/riokc95758/552/327892/1/routerstatus.pdf
  • (5) Berger, A., Weaver, N., Beverly, R., Campbell, L.: Internet Nameserver IPv4 and IPv6 Address Relationships. In: Proceedings of ACM Internet Measurement Conference (IMC) (2013)
  • (6) Beverly, R.: Yarrp’ing the Internet: Randomized High-Speed Active Topology Discovery. In: Proceedings of ACM Internet Measurement Conference (IMC) (Nov 2016)
  • (7) Beverly, R., Durairajan, R., Plonka, D., Rohrer, J.P.: In the IP of the Beholder: Strategies for Active IPv6 Topology Discovery. In: Proceedings of ACM Internet Measurement Conference (IMC) (Nov 2018)
  • (8) CAIDA: The CAIDA UCSD IPv6 Topology Dataset (2018), http://www.caida.org/data/active/ipv6˙allpref˙topology˙dataset.xml
  • (9) CAIDA: The CAIDA UCSD IPv6 Routed /48 Topology Dataset (2019), https://www.caida.org/data/active/ipv6˙routed˙48˙topology˙dataset.xml
  • (10) Chittimaneni, K., Chown, T., Howard, L., Kuarsingh, V., Pouffary, Y., Vyncke, E.: Enterprise IPv6 Deployment Guidelines. RFC 7381 (Informational) (Oct 2014), https://www.rfc-editor.org/rfc/rfc7381.txt
  • (11) Czyz, J., Luckie, M., Allman, M., Bailey, M.: Don’t Forget to Lock the Back Door! A Characterization of IPv6 Network Security Policy. In: Network and Distributed Systems Security (NDSS) (2016)
  • (12) Czyz, J., Allman, M., Zhang, J., Iekel-Johnson, S., Osterweil, E., Bailey, M.: Measuring IPv6 Adoption. SIGCOMM Comput. Commun. Rev. 44(4) (Aug 2014)
  • (13) Dhamdhere, A., Luckie, M., Huffaker, B., claffy, k., Elmokashfi, A., Aben, E.: Measuring the Deployment of IPv6: Topology, Routing and Performance. In: Proceedings of ACM Internet Measurement Conference (IMC) (2012)
  • (14) Foremski, P., Plonka, D., Berger, A.: Entropy/IP: Uncovering Structure in IPv6 Addresses. In: Proceedings of ACM Internet Measurement Conference (IMC) (2016)
  • (15) Gasser, O., Scheitle, Q., Foremski, P., Lone, Q., Korczyński, M., Strowes, S.D., Hendriks, L., Carle, G.: Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists. In: Proceedings of ACM Internet Measurement Conference (IMC) (2018)
  • (16) Gont, F., Chown, T.: Network Reconnaissance in IPv6 Networks. RFC 7707 (Informational) (Mar 2016), http://www.ietf.org/rfc/rfc7707.txt
  • (17) Hyun, Y., k. claffy: Archipelago measurement infrastructure (2018), http://www.caida.org/projects/ark/
  • (18) IAB, IESG: Recommendations on IPv6 Address Allocations to Sites. RFC 3177 (Informational) (Sep 2001), http://www.ietf.org/rfc/rfc3177.txt
  • (19) Livadariu, I., Ferlin, S., Alay, Ö., Dreibholz, T., Dhamdhere, A., Elmokashfi, A.: Leveraging the IPv4/IPv6 identity duality by using multi-path transport. In: 2015 IEEE Conference on Computer Communications Workshops (2015)
  • (20) Luckie, M., Beverly, R.: The Impact of Router Outages on the AS-level Internet. In: Proceedings of ACM SIGCOMM (2017)
  • (21) Martin, J., Rye, E.C., Beverly, R.: Decomposition of MAC Address Structure for Granular Device Inference . In: Proceedings of the Annual Computer Security Applications Conference (ACSAC) (Dec 2016)
  • (22) Murdock, A., Li, F., Bramsen, P., Durumeric, Z., Paxson, V.: Target Generation for Internet-wide IPv6 Scanning. In: Proceedings of ACM Internet Measurement Conference (IMC) (2017)
  • (23) Narten, T., Draves, R., Krishnan, S.: Privacy Extensions for Stateless Address Autoconfiguration in IPv6. RFC 4941 (Sep 2007), http://www.ietf.org/rfc/rfc4941.txt
  • (24) Narten, T., Huston, G., Roberts, L.: IPv6 Address Assignment to End Sites. RFC 6177 (Best Current Practice) (Mar 2011), http://www.ietf.org/rfc/rfc6177.txt
  • (25) Plonka, D., Berger, A.: Temporal and Spatial Classification of Active IPv6 Addresses. In: Proceedings of ACM Internet Measurement Conference (IMC) (2015)
  • (26) Pujol, E., Richter, P., Feldmann, A.: Understanding the share of IPv6 traffic in a dual-stack ISP. In: Passive and Active Measurement (PAM) (2017)
  • (27) RIPE: Best Current Operational Practice for Operators: IPv6 prefix assignment for end-users - persistent vs non-persistent, and what size to choose (2017), https://www.ripe.net/publications/docs/ripe-690
  • (28) Rohrer, J.P., LaFever, B., Beverly, R.: Empirical Study of Router IPv6 Interface Address Distributions. IEEE Internet Computing (Aug 2016)
  • (29) Rye, E.C., Martin, J., Beverly, R.: EUI-64 Considered Harmful (2019)
  • (30) Srisuresh, P., Holdrege, M.: IP Network Address Translator (NAT) Terminology and Considerations. RFC 2663 (Informational) (Aug 1999), http://www.ietf.org/rfc/rfc2663.txt
  • (31) Team Cymru: IP to ASN mapping (2019), https://www.team-cymru.org/IP-ASN-mapping.html
  • (32) Zander, S., Wang, X.: Are We There Yet? IPv6 in Australia and China. ACM Trans. Internet Technol. 18(3) (Feb 2018)

Appendix Appendix A Algorithm Details

for  do
      &
     
     
for  do
     if  then
                
for  do
     Discover()
Algorithm 1 Discover_Init()
for  do
     for  do
           yarrp()
                
     if  or  then
                
Algorithm 2 Discover()
Figure 10. Distinct last hops for Selected
Figure 11. Probe Efficiency for Selected
Target Prefixes Unique Last Hops
4 389 56,507
8 304 55,498
16 255 54,636
32 226 52,671
64 170 46,913
128 125 37,427
Table 2. Sample Round Two Probing Results for Selected

Appendix Appendix B Sensitivity Testing of

In Algorithm 2, we make use of the parameter in order to determine whether or not to continue to probe a target prefix to attempt to discover more peripheryedge topology. Following the first probing round, we group unique last hops according to the target /48 prefix that produced them, and sort these target prefixes by number of unique last hops. We wish to be judicious in the prefixes selected to be probed in the second round of probing, selecting only those prefixes that appear likely to return new peripheryedge addresses, balancing the amount of time incurred by each additional prefix selected, and leveraging the best current practice to subnet on nybble boundaries (bcop-prefix, ). For the last reason, we initially choose ; the first round exhaustively probes all /56s inside of a /48, and obtaining 17 or more unique last hop addresses is indicative of subnetting below the /52 level. We seek to validate this choice empirically, however, and as such conduct sensitivity analysis on a sample of target prefixes from the passive seed data.

We randomly sample 1,000 prefixes from the passive seed data and use yarrp to trace to a random IID in each /56 of every target /48 prefix. All but one of the target prefixes obtains at least one unique last hop. From here, we choose , select the target prefixes with distinct last hops, and conduct the second round of probing using these selected prefixes. Table 2 summarizes the number of target prefixes selected for each choice of , and as well as the count of unique last hops obtained using each . Figure 11 displays the number of unique last hops for each choice of graphically. In order to determine which choice is most efficient in unique last hop discovery, we plot the number of probe targets divided by the number of distinct last hops discovered at each . Of the values chosen, clearly lies at an inflection point in Figure 11, indicating that while each successive value is more efficient, efficiency increases at a decreasing rate. Because of the efficiency of , combined with its high absolute address discovery, we choose as the threshold for passage to round 2 of the address discovery stage.