Pacer: Network Side-Channel Mitigation in the Cloud

08/30/2019 ∙ by Aastha Mehta, et al. ∙ 0

An important concern for many Cloud customers is data confidentiality. Of particular concern are potential data leaks via side channels, which arise when mutually untrusted parties contend on resources such as CPUs, caches, and networks. In this paper, we present a principled solution for mitigating side channels that arise from shared network links. Our solution, Pacer, shapes the outbound traffic of a Cloud tenant to make it independent of the tenant's secrets by design. At the same time, Pacer permits traffic variations based on public (non-secret) aspects of the tenants' computation, thus enabling efficient sharing of network resources. Implementing Pacer requires modest changes to the guest OS and the hosting hypervisor, and only minimal changes to guest applications. Experiments show that Pacer allows guests to protect their secrets with overhead close to the minimum possible considering the guest's conditional traffic distribution given public information. For instance, Pacer can hide a requested Wiktionary document in one of two size clusters at an average throughput and bandwidth overhead of 6.8

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Clouds provide elastic computing, communication, and storage to customers large and small on a pay-per-use basis. An important requirement for many customers is the confidentiality of Cloud-hosted data and computation. Trusted execution environments (TEEs), such as those offered by Azure confidential computing [1], provide hardware-based isolation under a strong threat model. However, a TEE alone does not prevent leakage via side channels.

Side channels arise when mutually distrusting parties share hardware resources. Such sharing is in the very nature of public Clouds; for instance, CPUs, cores, caches and memory buses may be shared among otherwise isolated tenants co-located on the same server in an infrastructure-as-a-service (IaaS) Cloud. All of these shared resources have been exploited as side channels [69, 50, 47, 9, 49, 46, 70, 41, 32, 71, 5, 25, 59, 73, 7, 26, 56]. Even if tenants rent dedicated CPU sockets and use memory within their local NUMA domain only, they still share a network link, which can leak a tenant’s secrets [6]. This attack works even if adversary and victim are hosted on separate servers but share a bottleneck link (see §2).

In contrast to side channels via shared microarchitectural state and memory, side channels that arise via shared network links have not received much attention, particularly in the context of Cloud computing. In this work, we present a system called Pacer, which focuses on mitigating network side channels in the Cloud. Specifically, we mitigate leaks of a tenant VM’s sensitive data to an adversary that can co-locate VMs with the tenant’s VM on the same server, rack or the datacenter, and can perform fine-grained network delay measurements. Pacer does not protect tenants against side channels via shared micro-architectural state and memory hierarchy, but these may be mitigated by renting dedicated servers or dedicated sockets on Cloud hosts. Moreover, Pacer can be easily combined with existing solutions that mitigate these other side channels [61, 11].

A natural approach to mitigating network side-channel attacks is to weaken the correlation between the victim’s traffic shape and its secrets by adding noise to the adversary’s observations [35]. However, relying on noise alone is risky, because we cannot safely assume that the noise is sufficient in level and entropy at all times to overcome a sophisticated adversary.

A principled approach is to shape the victim’s traffic to completely decorrelate the traffic shape from secrets. Shaping involves padding all outgoing messages to a secret-independent size and transmitting all packets at secret-independent times. Unfortunately, a naïve use of this approach requires that each tenant transmit traffic at the peak rate of its workload, because any workload-dependent variation could reveal secrets.

Without loss of generality, consider the tenant to be a single-tier server application serving different client requests. Suppose the shape of the tenant’s outbound network traffic depends on a combination of public (non-private) and private parameters of the tenant’s workload (i.e. request workload) and internal state. Our system, Pacer, allows the shape of the tenant’s response traffic to reflect public but not private information, thus protecting secrets while reducing bandwidth and latency overheads compared to the naïve shaping approach. Thus, Pacer shapes a tenant’s traffic securely and efficiently.

Security. Whenever the tenant produces network output, a transmit schedule (shape) is selected for the traffic based on public information. The tenant’s guest operating system (OS) prepares network packets as usual. The IaaS hypervisor shapes traffic as specified by the schedule, transmitting dummy packets when the guest fails to prepare payload packets in time. The resulting traffic shape is secret-independent by design.

Efficiency. A tenant partitions its workload based on public information. For each partition, a profiler samples the distribution of the guest’s unmodified payload traffic shapes and computes a schedule. All traffic within the partition is then shaped according to this schedule. Profiling affects performance but not security, because any traffic schedule chosen independent of secrets hides those secrets.

Usability. Pacer places modest requirements on the tenant. The tenant and its clients must run a guest OS that supports Pacer, e.g., by loading the Pacer kernel module for Linux and running the Pacer profiler. Also, the tenant must indicate to Pacer how it partitions its workload. This requires only minimal annotations in the tenant application to identify the traffic boundaries and workload partitions. The workload partitioning enables a tenant to choose an appropriate tradeoff between privacy and overhead for its application. From the Cloud provider’s perspective, the system requires modest changes to the IaaS hypervisor.

As described in §8, prior work that decorrelates application secrets from network side channels either does not present a working system, or does not consider all network side channels, or is inherently inefficient when workload traffic is bursty. To our knowledge, Pacer is the first working system that efficiently prevents all network side-channel leaks by design.

Contributions. This paper presents a novel traffic-shaping tunnel abstraction that prevents network side-channel leaks (§4); it describes a paravirtualized tunnel implementation for an IaaS Cloud that requires modest changes to a hypervisor and guests, and a gray-box profiler that generates transmit schedules automatically (§5 and §6, respectively); finally, it presents an experimental evaluation on two applications: a Wiktionary content server and a streaming video service, which show that strong mitigation of network side channels is possible with modest overhead (§7). We discuss related work in §8.

2 Attack feasibility

We first study the conditions under which a network side-channel attack is possible and demonstrate a simple attack.

Attack prerequisites. To carry out a network side-channel attack, an adversary must be able to observe a victim’s network traffic. An adversary with access to network elements like links, switches, or routers can observe the traffic directly. An adversary without direct access can still observe victim traffic indirectly if they can control attack traffic that shares bandwidth with the victim’s traffic.

Indirect observation is impossible if each network flow has exclusively reserved bandwidth, as in time-division multiple access (TDMA), which ensures non-interference among flows. However, this approach prevents statistical multiplexing and is very inefficient for bursty traffic. On the other hand, when bandwidth is shared, then regardless of the queuing discipline, available bandwidth and queuing delays observed by one flow are influenced by concurrent flows.

Measuring victim’s traffic shape.

An adversary can exploit this fact by measuring either the available bandwidth or queuing delay, either of which carry a (potentially noisy) signal about the victim traffic’s shape. To demonstrate a simple attack, we generate periodic UDP traffic across a shared network link, measure the per-packet delays, and look for delay shifts that signal a victim flow at least 100KB in size. Our results show that even a very simple rule-based classifier can reliably detect the size of victim flows with a resolution of 300KB on a 1 Gbps link. The attack works when the shared bottleneck is at the server’s access link or a downstream link. We have also confirmed that the attack works in principle on a public IBM Cloud, albeit with lower resolution (1 GB), presumably due to higher network capacity and noise induced by cross traffic. A more sophisticated, appropriately trained classifier could no doubt achieve much better resolution and robustness to noise.

Recognizing videos. Next, we show how we can use coarse-grained measurements of traffic shape to recognize MPEG-DASH coded videos with good accuracy. With MPEG-DASH, videos are divided into variable-sized compressed segments that each cover 5 seconds of playback time. The sequence of segment sizes therefore characterizes a video’s length and content. In a randomly selected set of 956 1080p videos we downloaded from YouTube, and assuming we can identify victim flows at 300KB resolution, we can uniquely identify 681 (72%) of the videos, and place 120, 66, 8, 15, 12, 8, and 46 of the remaining videos in candidate sets of sizes 2, 3, 4, 5, 6, 8 and 46, respectively.

Our simple experiment confirms prior work [6, 57, 29, 63, 54, 66, 13, 12, 48] and shows that even a very simple attack can identify videos in a large collection with good accuracy. While an attack in a production environment faces additional challenges like achieving co-residency [53, 30, 31] and dealing with noise due to cross-traffic, Cloud tenants that require strong confidentiality have to consider network-based side-channels as a realistic threat.

3 Pacer overview

In this section, we state Pacer’s threat model and privacy goals, and discuss the key ideas to overcome its challenges.

3.1 Threat model

A tenant in a public IaaS Cloud, subsequently called the victim, executes in one or more guest VMs. The victim performs arbitrary computations but does not invoke other guest VMs or Cloud back-end services. The victim serves a set of trusted clients that connect to guest VMs from outside the Cloud via a secure virtual private network (VPN), or within the Cloud via a virtual Cloud network111In principle, Pacer can support multi-tier guests and also guests that provide an open service to untrusted clients, but these extensions are beyond the scope of this paper and not implemented in our prototype.. (Guests may require a second level of authentication to separate clients’ privileges, but this is not relevant for Pacer’s security.) This assumption is reasonable for services with a restricted audience that require a high degree of confidentiality (e.g., a content server for employees of an enterprise or a service that operates on confidential data of individual clients).

The victim’s goal is to protect its secrets; these secrets can be reflected in parameters of client requests (i.e., secret inputs, such as the name of a requested file) or in the victim’s internal state or code (e.g., which request handlers are cache-hot because they were recently accessed). The victim rents an entire server socket for exclusive use and uses main memory within the NUMA domain associated with this socket222In principle, this assumption can be relaxed by combining Pacer with complementary work to mitigate side-channel leaks through shared memory buses, caches, and micro-architectural CPU state [61, 11].. The victim partitions its workload independent of secrets.

Our focus is on server-side security; protecting client privacy is a non-goal. Moreover, we assume that client request traffic reveals no secrets through its shape (its length, number of packets, or timing)333This assumption can be avoided by shaping client traffic. In principle, Pacer can support this by running a hypervisor on the client side.. This implies that the time of requests does not depend on any secrets or the actual completion times of previous responses. A victim can design the content being served to meet this requirement, e.g., in the case of a web server by in-lining embedded objects, and using Javascript to decorrelate (in time) requests for embedded links and user requests that may depend on previously served content.

The adversary seeks to learn the victim’s secrets. It controls one or more guest VMs in the same public IaaS Cloud and has the ability to co-locate with the victim’s VMs in the same server, rack, or data center. The adversary controls network clients, which may communicate freely with the adversary’s VMs. The adversary cannot access the victim’s VPN or impersonate the victim’s clients.

The adversary has access to all services available to IaaS guests, including the ability to time the transmission and reception of its own network packets with high precision. The adversary also has the ability to intercept and inject network packets and to observe the guest’s traffic outside the Cloud.

3.2 Challenges and key ideas

A principled approach to avoiding network side channels is to shape the network traffic so that it cannot reveal secrets. If done naïvely, shaping can be very costly in terms of bandwidth or latency when the payload traffic is bursty. Pacer exploits the following key ideas to reconcile security and efficiency.

Per-guest dynamic shaping. Pacer shapes each guest’s network traffic dynamically according to its prevailing workload, thus enabling dynamic sharing of the available network capacity among different guests for efficiency. This requires that the presence and time of requests reveal no secrets, as assumed in our threat model.

Secret-independent shaping.

Instead of insisting on a uniform traffic shape for all of a guest’s network traffic, Pacer allows the shape to vary, as long as the variations don’t depend on secrets. For instance, if the type of content being requested from a server (e.g., document vs. video) is not a secret, then a different traffic shape can be used for the two. This additional degree of freedom helps minimize overhead for variable network traffic while preventing leaks of secrets.

Gray-box profiling. Dynamic traffic shaping requires an understanding of how a guest’s secrets affect its network traffic. This information can be obtained via program analysis, but that is difficult on arbitrary binaries running in a VM. Black-box profiling can be performed on arbitrary guests, but cannot reliably discover all dependencies and therefore is not secure. Pacer instead relies on gray-box profiling, which requires no knowledge of a guest’s internals beyond an explicit traffic indicator from the guest. This indicator partitions the guest’s possible network interactions independent of secrets and indicates the onset of a particular interaction. It is used by Pacer in two ways: (i) to profile the guest’s network interactions and generate a transmit schedule for each partition, and (ii) to instantiate a transmit schedule for a network interaction. As long as a guest computes the indicator independent of secrets, the choice of indicator affects performance but not security.

Paravirtualized support for traffic shaping. Pacer provides paravirtualized hypervisor support that enables guests to implement a traffic-shaping network tunnel, while adding only a modest amount of code to the hypervisor. A performance-isolated shaping component in the hypervisor initiates transmissions based on a schedule. If no payload is available at the time of a packet’s scheduled transmission, the shaping component transmits a dummy packet instead that, to the adversary, is indistinguishable from a payload packet.

4 Traffic-shaping tunnel

Pacer’s key abstraction is a traffic-shaping tunnel. The tunnel shapes application payload traffic to make it secret-independent, thus defending against adversaries who can observe, directly or indirectly, traffic inside the tunnel. In this section, we describe the tunnel and its security independent of a specific application setting, implementation, or placement of tunnel entry and exits. In §5, we evolve the design to work within the constraints of a realistic Cloud environment.

Requirements. We begin with the requirements for a traffic-shaping tunnel. Secret-independent traffic shape: Transmissions must follow a schedule that does not depend on secrets; actual transmission times must not be delayed by potentially secret-dependent computations. Unobservable payload traffic: The traffic shape must not reveal, directly or indirectly, an application’s actual time and rate of payload generation and consumption. This implies that flow control must not affect the traffic shape; that padded content must elicit the same response (e.g., ACKs) from receivers as payload data; and that packet encryption must encompass the padding. This in turn requires that padding be added at or above the transport layer, while encryption be done below the transport layer. Congestion control: The tunnel must react to network congestion. Congestion control is needed for network stability and fairness, and does not reveal secrets since it reacts to network conditions, which themselves depend only on shaped and third-party traffic.

Figure 1: Traffic-shaping tunnel (one endpoint)

Architecture. Figure 1 shows the tunnel’s architecture. The tunnel protocol stack runs on both tunnel endpoints. (Only one of two symmetric endpoints is shown in the figure.) The tunnel protocol stack consists of a shaping layer on top of a modified transport layer (e.g., TCP) on top of the encryption layer. These three layers rest on top of conventional IP and link layers. Each tunnel is associated with a flow identified by a 5-tuple of source and destination IP addresses and ports, and the transport protocol.444We describe the tunnel in terms of TCP; however, another stack like QUIC [36] can also be used.

The shaping layer initiates transmissions according to a schedule and pads packets to a uniform size. It interacts with applications via a set of shared, lock-free queues. The layer takes application data from a per-flow outbound queue and transmits it in the tunnel. It places incoming data from the tunnel into a per-flow inbound queue. Finally, it receives traffic indicators and per-flow crypto keys (to be used by the encryption layer) via a per-application command queue.

A separate, user-level gray-box profiler (ProfPace) analyzes timestamps and traffic indicators collected by the tunnel, and generates and updates transmit schedules in the schedule database. ProfPace is described in §6.

Assumptions. The execution of the tunnel network stack is assumed to be performance-isolated from the application and any other computation outside the stack. As a result, the processing delays in the stack cannot be influenced by computations outside the stack, which could depend on secrets.

The shaping, transport, and encryption layers operate on cleartext data. Therefore, we must assume that they execute in constant time with respect to both the amount and content of data they process. That is, these layers avoid data-dependent control flow and memory access patterns in the implementation of their data paths. As a result, the layers’ processing delays and memory footprints are independent of the amount and content of payload data.

Transmit schedules. A transmit schedule is a finite series of times at which packets within a flow are transmitted. A schedule is typically associated with a type of packet train, e.g., a file transfer or the response to a request for service. There is at most one active schedule on a flow at a time; successive schedules on the same flow are non-overlapping in time.

Outbound data processing. A timestamp is taken whenever data is queued by the application; these timestamps and the recorded traffic indicators are shared with the gray-box profiler. The shaping layer retrieves a chunk of available data from the flow’s outbound queue whenever a transmission is due on a flow according to the active schedule (if any) and the congestion window is open (see transport layer below). The layer removes a number of bytes that is the minimum of (i) the available bytes in the queue, (ii) the receiver’s flow control window (see transport layer below), and (iii) , the network’s maximal transfer unit (MTU) minus the size of all headers in the stack. If fewer than bytes (possibly zero) were retrieved from the queue due to payload unavailability or flow control, the shaping layer pads the chunk to bytes. It adds a header that indicates the number of padding bytes added.

Transport layer. The transport layer operates as normal, except for two tunnel-related modifications: 1) When the congestion window closes, the transport layer signals the shaping layer to suspend the flow’s transmit schedule until the congestion window reopens. Schedule suspension ensures network stability and TCP-friendliness, and does not leak information because it depends only on network conditions. 2) Flow control is modified to make it unobservable to the adversary. The transport layer signals to the shaping layer the size of the flow control window advertised by the receiver. This window controls how much payload data is included in packets generated by the shaping layer (see above). The transport layer transmits packets irrespective of the flow control window, sending dummy packets while the window is closed, which are discarded at the other end of the tunnel.

The transport layer passes outbound packets to the encryption layer, which adds a message authentication code (MAC) keyed with the flow’s key to a header and encrypts the packet with the flow’s key. Finally, encrypted packets are passed to the IP layer, where they are processed as normal down the remaining stack and transmitted by the NIC.

Inbound packet processing. Packets arriving from the tunnel are timestamped; the stamps are shared with the profiler. Packets pass through the layers in reverse order, causing TCP to potentially send ACKs. The encryption layer decrypts and discards packets with an incorrect MAC. The shaping layer strips padding and places the remaining payload bytes (if any) into the inbound queue shared with the application.

Schedule installation. A transmit schedule must be installed on a flow before data can be sent via the tunnel. A guest application does so indirectly by sending traffic indicators. The application provides the flow’s 5-tuple , a traffic id (which maps to a transmit schedule), and a type. The shaping layer looks up the schedule associated with in the schedule database and associates it with flow .

There are two types of schedules: default and custom. A default schedule is installed when the flow is created. This schedule acts as a template, which is instantiated automatically by the shaping layer whenever an incoming packet arrives that indicates the start of a new network exchange (e.g., a GET request on a persistent HTTP connection), identified by the TCP PSH flag. The schedule starts at a time equal to the arrival time of the packet that causes the schedule’s instantiation.

A default schedule active on a flow can be extended by a custom schedule in response to an application traffic indicator. For instance, a default schedule that allows a TLS handshake might be extended with one that is appropriate for the response to the first incoming network request. The new schedule can extend the currently active schedule only if the new schedule’s prefix matches the prefix of the currently active schedule that has already been played out. Because the new schedule is anchored at the same time as the profile it extends, the time of a schedule extension is unobservable to the adversary. A traffic indicator that would require a custom profile that does not match the played-out prefix of the active schedule is ignored.

4.1 Tunnel security

Next, we justify the tunnel’s security, summarized by property S0: The shape of traffic in the tunnel does not depend on secrets. Follows from S1–S7.
S1: Transmit schedules are chosen based on public information. By assumption about applications’ choice of schedules.
S2: The traffic in the tunnel is independent of the payload traffic. Holds because packets are (i) padded and transmitted independently of the application’s rate of payload generation and consumption; (ii) elicit a transport-layer response from the receiver independently of payload traffic; and (iii) the packet contents including headers that reveal padding are encrypted.
S3: All packet transmissions follow a schedule. Holds because shaping initiates transmissions according to a schedule.
S4: Delays between scheduled and actual packet transmission times do not reflect secrets. Follows from the fact that the tunnel stack, from the shaping layer down, is performance-isolated from any secret-dependent computation and layers that operate on cleartext are constant-time.
S5: Transmit schedules are activated, paused, and re-activated at a secret-independent delay from any observable event that causally precedes the pausing or (re-)activation. Holds because (i) pausing, reactivation, and instantiation of default schedules is performed within the performance-isolated tunnel stack; and (ii), by assumption, custom schedule installations that take immediate effect are not causally preceded by an observable event.
S6: Transmit schedules are suspended and resumed only according to the network’s congestion state. Follows from the tunnel’s transport layer congestion control mechanism.
S7: Modifications of active transmit schedules do not reveal secrets. Holds because the time of schedule replacement is unobservable to the adversary (matching prefix).

5 Pacer design

In the previous section, we described a generic traffic-shaping tunnel, which ensures that the shape of traffic within the tunnel reveals no secrets by design. In this section, we describe the full design of Pacer, which is a practical implementation of a traffic-shaping tunnel in the context of a public IaaS Cloud.

We begin with a discussion of constraints on the design space in the context of a IaaS Cloud. First, the tunnel entry must be integrated with the IaaS server. In an IaaS Cloud, co-located tenants typically share the network link attached to the server and can therefore indirectly observe each others’ traffic. Therefore, the tunnel entry must be in the IaaS server to ensure the attached link lies inside the tunnel.

Second, as we know from the previous section, the tunnel requires that the network stack is performance-isolated from secret-dependent computations and layers that deal with cleartext are constant-time. In the context of an IaaS server, all guest computation must be assumed to be secret-dependent. These circumstances suggest that shaping should be implemented in the IaaS hypervisor, where it can be executed with dedicated resources and tightly controlled. Finally, shaping requires padding, which must be done at the transport layer to unsure it is unobservable.

A simple way to meet all requirements is to place the entire network stack into the hypervisor and performance-isolate it from guests. This approach, however, has significant limitations. First, ensuring performance isolation for an entire network stack is technically challenging even in the hypervisor. Second, the approach requires that guests and their network peers use a fixed network stack provided by the IaaS platform. Third, it adds significant complexity to the hypervisor.

Pacer architecture. Pacer addresses the tension outlined above using a paravirtualization approach. In Pacer, the hypervisor cooperates with the guest OS to implement the traffic-shaping tunnel. The responsibilities are divided in such a way that (i) the hypervisor can ensure tunnel security with only weak assumptions about a guest’s rate of progress; (ii) the performance-isolated hypervisor component is small; (iii) required changes to the guest OS are modest. Effectively, we extend the IaaS hypervisor to provide a small set of functions that allows guests to implement a traffic-shaping tunnel, while guests retain the flexibility to use custom stacks.

Figure 2: Pacer architecture

Figure 2 shows Pacer’s architecture. Unlike the strictly layered tunnel stack from §4, Pacer factors out a small set of functions that inherently require performance-isolation into the lowest layer, implemented in the IaaS hypervisor. The HyPace component plugs into Xen and provides these functions. The GPace component, a Linux kernel module, plugs into the guest OS and the OS of any network clients that interact with the guest. It implements the tunnel in cooperation with HyPace.

HyPace instantiates transmit schedules, encrypts and MACs packets, and initiates their transmissions, while masking potentially secret-dependent delays in its execution. It can generate padded (dummy) packets subject to congestion control independently from the guest network stack, thereby avoiding the need to performance-isolate the guest. GPace pads payload packets, and exposes each flow’s congestion window, sequence number, and crypto key to HyPace. The guest has direct access to a virtual NIC (vNIC) configured by the hypervisor, which it uses to receive but not to transmit packets. Pacer’s security properties remain equivalent to the generic tunnel’s (§5.3).

5.1 HyPace

Similar to the shaping layer in the generic tunnel, HyPace receives traffic indicators from applications (via GPace), instantiates template schedules in response to incoming packets (signaled by GPace), and initiates transmissions. To ensure tunnel security despite potentially secret-dependent delays in the guest, however, HyPace performs additional functions and there are differences, which we discuss next.

HyPace implements padding, encryption, and congestion control in cooperation with the guest. HyPace pauses a transmit schedule when a flow’s congestion window closes and resumes the schedule when it reopens. When a transmission is due on a flow and the congestion window is open, HyPace checks whether the guest has queued a payload packet. If not, it generates a dummy packet with proper padding, transport header, and encryption, using the next available TCP sequence number and the flow key shared with the guest. Finally, it initiates the transmission of the payload or dummy packet and reduces the congestion window accordingly.

Interface with guests. HyPace shares a memory region pairwise with each guest. This region contains a data structure for each active flow. The flow structure contains the following information: the connection 5-tuple associated with the flow; a sequence of transmission schedule objects; the current TCP sequence number and the right edge of the congestion window ; the flow’s encryption key; and, a queue of packets prepared for transmission by the guest. Each transmit schedule object contains the and a starting timestamp. HyPace and the guest use lock-free synchronization on data they share.

Packet transmission. HyPace transmits packets according to the active schedule in the packet’s flow. From a security standpoint, packets need not be transmitted at the exact scheduled times; however, any deviation between scheduled and actual time must not reveal secrets.

On general-purpose server hardware, it is challenging to initiate packet transmissions such that their timing cannot be influenced by concurrent, secret-dependent computations. Using hardware timers, events can be scheduled with cycle accuracy. However, the activation time and execution time of a software event handler is influenced by a myriad of factors. These may include (i) disabled interrupts at the time of the scheduled event; (ii) the CPU’s microarchitectural, cache, and write buffer state at the time of the event; (iii) concurrent bus traffic; (iv) frequency and voltage scaling; and (v) non-maskable interrupts during the handler execution. Many of these factors are influenced by the state of concurrent executions on the IaaS server and may therefore carry a timing signal about secrets in those executions.

Masking event handler execution time. HyPace masks hardware state-dependent delays to make sure they do not affect the actual time of transmissions. A general approach is as follows. First, we determine empirically the distribution of delays between the scheduled time of a transmission and the time when HyPace’s event handler writes to the NIC’s doorbell register

, which initiates the transmission. We measure this distribution under diverse concurrent workloads to get a good estimate of its true maximum. We relax this estimate further to account for the possibility that we may not have observed the true maximum and call this resulting delay

. Second, for a transmission scheduled at time , we schedule a timer event at . Third, when the event handler is ready to write to the NIC doorbell register, it spins in a tight loop reading the CPU’s clock cycle register until is reached and then performs the write. By spinning until , HyPace masks the event handler’s actual execution time, which could be affected by secrets.

Unfortunately, the measured distribution of event handler delays has a long tail. We observed that the median and maximum delay can differ by three orders of magnitude (tens of nanoseconds to tens of microseconds). This presents a problem: With the simple masking approach, a single core could at most initiate one transmission every seconds, making it infeasible to achieve the line rate of even a 10Gbps link. Instead, we rely on batched transmissions.

Batched transmissions. The solution is based on two insights. First, instances in the tail of the event handler delay distribution tend to be independent. As a result, the maximal delay for transmitting packets in a single event handler activation does not increase much with . Therefore, we can amortize the overhead of masking handler delays over packets. Second, actual transmission times can be delayed as long as the delay does not depend on secrets. Therefore, it is safe to batch transmissions.

We divide time into epochs

, such that all packet transmissions from an IaaS server scheduled in the same epoch, across all guests and flows, are transmitted at the end of that epoch. An event handler is scheduled once per epoch, it prepares all packets scheduled in the epoch, spins until the batch transmission time, and then initiates the transmission with a single write to the NIC’s doorbell register.

Let us consider factors that could delay the actual packet transmission time once the spinning core issues the doorbell write. Reads were executed before the spin, so the state of caches plays no role. The write buffer should be empty after the spin. Interference from concurrent NIC DMA transfers reflects shaped traffic and is therefore secret-independent. Similarly, any delays in the NIC itself due to concurrent outbound or inbound traffic cannot depend on secrets. However, the doorbell write itself could be delayed by traffic on the memory bus, PCIe bus, or bus controller/switch.

Hardware interference and NIC support. A remaining source of delays are concurrent bus transactions caused by potentially secret-dependent computations. We tried to detect such delays empirically and have not been able to find clear evidence of them. Nonetheless, such delays cannot be ruled out on general-purpose hardware. A principled way to rule out such interference would require hardware support.

For instance, a scheduled packet transmission function provided by the NIC would be sufficient. Software would queue packets for transmission with a future transmission time . At time , the NIC DMAs packets into onboard staging buffers in the NIC. Here, would be chosen to be larger than the maximal possible delay due to bus contention. At time , the NIC would initiate the transmission automatically. With such NIC support, HyPace would prepare packets for transmission as usual, but instead of spinning until it would immediately queue packets with . Incidentally, NIC support for timed transmissions is also relevant for traffic management, and a similar “transmit on time stamp” feature is already available on modern smart NICs [2]. We plan to investigate NIC support in future work.

5.2 GPace

GPace is a Linux kernel module that implements a traffic-shaping tunnel jointly with HyPace. On the client-side of a network connection, GPace extends the kernel to terminate the tunnel. GPace pads outgoing TCP segments to MTU size and removes the padding on the receive path. It modifies Linux’s TCP implementation to share its per-flow congestion window and sequence number with HyPace, and to notify HyPace of retransmissions so that HyPace can extend the active schedule by one transmission. Unlike in the generic tunnel, where the shaping occurs above the transport layer, this schedule extension is necessary to allow for retransmission; it does not leak information because it depends only on network state.

Note that TCP’s flow control window is not advertised to HyPace, causing HyPace to send dummies if the receiver’s flow control window is closed, as required. GPace timestamps outbound data arriving from applications and inbound packets from the tunnel in the vNIC interrupt handler. All timestamps and recorded traffic indicators are given to the profiler (§6).

GPace allows applications to install session keys and provide traffic indicators on flows via IOCTL calls on network sockets. Recall that applications specify a flow, a traffic ID and a type as arguments when indicating traffic. GPace passes this information into the per-flow queue shared with HyPace, which uses the as an index to look up the corresponding transmit schedule in the database.

Packet processing. With GPace, the guest OS generates TCP segments as usual, but pads them to the MTU size before passing them to the IP layer. Instead of queuing packets in the vNIC’s transmit queue, GPace queues them in per-flow transmit queues shared with HyPace. The guest OS processes incoming packets as usual by accepting interrupts and retrieving packets directly from its vNIC.

Schedule (re-)activation delays. Unlike the generic tunnel, Pacer processes inbound network packets in the guest, which is not performance-isolated. Therefore, care must be taken to ensure that the time of activation or re-activation of a transmit schedule in response to an inbound packet does not reveal the guest kernel’s execution time, which could depend on secrets. Schedule (re-)activation must occur at a defined, secret-independent delay from the event that causally precedes it, e.g., a packet arrival.

Let be HyPace’s epoch length and be the guest OS’s empirical maximal inbound packet processing time. There are four such events to consider: 1) The arrival of the first packet of a request. GPace instantiates a default schedule with a start time equal to the packet’s arrival time. To make sure the first transmission occurs in time, we require that the initial response time of any default schedule be larger than . 2) The arrival of an ACK that opens the congestion window. GPace ensures the ACK does not enable a transmission that is scheduled within of the ACK’s arrival. 3) The arrival of an ACK that causes a retransmission. GPace ensures the ACK does not enable a transmission that is scheduled less that from the ACK’s arrival. 4) A timeout that causes a retransmission. GPace ensures the timeout does not enable a transmission that is scheduled within of the timeout. Here, we use as a conservative upper bound on the delay of the timeout event handler.

These four rules make the guest’s actual processing time for incoming packets and timeouts unobservable to the adversary.

5.3 Pacer security

We justify Pacer’s overall security. Pacer’s threat model rules out side channels via shared CPU state, caches, and memory bandwidth, as well as shared Cloud back-end services. Therefore, the adversary is limited to (i) trying to connect to the victim as a client and observe the timing and content of responses, or (ii) measuring the shape of the victim’s traffic by observing packet delays on a shared network link.

Attack (i) is not possible because the adversary cannot elicit a response from the victim. Pacer relies on encryption and a MAC keyed with pre-shared keys and GPace silently ignores incoming packets that cannot be authenticated. Attack (ii) is unproductive, because the victim’s incoming traffic shape is secret-independent by assumption and its outgoing traffic is shaped to be secret-independent. Next, we justify that the victim’s outgoing traffic shape is indeed secret-independent by design. In other words, Pacer’s tunnel has property S0 of the generic tunnel from §4.

S1 and S3 hold trivially, because the relevant behavior of Pacer is equivalent to the generic tunnel’s. S2 holds because Pacer, like the generic tunnel, pads packets above the transport layer, encrypts packets below the padding layer, and makes flow control unobservable. S4 follows from GPace’s rules on the pausing and (re-)activation of transmission schedules. S5 holds because HyPace’s batch transmission mechanism masks the execution time of its transmission event handler. S6 holds because HyPace cooperates with GPace to pause and resume schedules in response to the network’s congestion state. Even though a schedule can be extended in Pacer, S7 still holds because schedule extension happens only in response to a packet loss, which is a public event.

6 Generating schedules

ProfPace profiles guests and generates transmit schedules. It analyzes the guest’s recorded network interactions and traffic indicators. For content-serving guest applications, we also analyze the guest’s content corpus to suggest a clustering that balances efficiency and privacy.

6.1 Gray-box profiling

ProfPace analyzes the arrival times of incoming packets, the times at which the guest OS queues packets for transmission, and the traffic indicators from guest applications. The guest’s traffic indicators serve three purposes. First, they delimit segments of semantically related sequences of inbound and outbound packets within a network flow. Second, the value indicated by the guest identifies segments of the same equivalence class, e.g., a TLS handshake, or a response to a request within a given workload partition. Recall that the application determines the based on public information; the therefore partitions the guest’s network interactions by public information. Third, the indicators indicate the onset of a network interaction and therefore provide an opportunity to install the appropriate schedule.

ProfPace bins the recorded network interaction segments by

. The set of observed segments in a bin are considered samples of the associated equivalence class of network interactions. ProfPace characterizes each class by a set of random variables. Empirically, we have determined the following variables sufficient: The delay between the first incoming packet and the first response packet

; the time between subsequent response packets ; and, the number of response packets . For each class, the profiler samples the distribution of these random variables from the segments in the associated bin.

Finally, ProfPace generates a transmit schedule for based on the sampled distributions of the random variables. Specifically, it generates a schedule with the 100th %-ile of the number of packets , the 99th %-ile of the initial delay , and the 90th %-ile of the spacing among subsequent packets . We have determined empirically that this works well.

Recall that as long as applications choose values based on public information, transmit schedules are relevant only for performance not security. An inadequate schedule could increase delays and waste network bandwidth due to extra padding, but cannot leak secrets. For good performance, during profiling runs, the guests should sample the space of workloads with different values of the public and private information, as well as different guest load levels, so that the resulting profiles capture the space of network traffic shapes well. Next, we consider how a guest can partition its workload to trade off performance and privacy in the case of a content server.

6.2 Corpus analysis

Pacer reduces overhead by allowing the shape of network traffic to reveal information deemed public by the guest. Moreover, a guest may define public/private information in order to trade off performance and confidentiality. For instance, a guest that serves a corpus of objects with a skewed size distribution faces a tradeoff: Hiding the workload perfectly requires padding every requested object to the largest object in the corpus; clustering the corpus such that each object is padded to the largest object in its cluster reduces overhead but reveals the cluster of a requested object. Reasoning about privacy with this clustering, particularly when the corpus’ popularity distribution is not known, is challenging in general and beyond the scope of this paper. Nevertheless, we highlight the large efficiency gains possible when clustering content with skewed size distributions. We show results when clustering the English Wiktionary and Wikipedia corpuses by size in §

7.

We cluster videos according to the sequence and size of their 5-second segments using the following algorithm. Note that the dynamically compressed segments of a video differ in size. Initially, we over-approximate the shape of each video by its maximal segment size and its number of segments . For each distinct video length and each distinct maximal segment size in the entire dataset, we compute the set of videos that are dominated by . A video is dominated by if and .

Let be a desired minimum cluster size. Our algorithm works in rounds. In each round, we select every dominating at least videos, and we choose as cluster the set of videos minimizing the average padding overhead per video, i.e. , where is the cardinality of the set of videos, is the maximal length across all videos in the set and is the maximal size of the -th segment across all videos in the set. Once a cluster is formed, videos in it are not taken into account in later rounds. The algorithm terminates when all videos are clustered. If the last cluster has less than videos, it is merged with the one formed before it. We show results of clustering YouTube videos in §7.1.

7 Evaluation

In this section, we describe our implementation, and present results of an empirical evaluation of our Pacer prototype.

We implemented HyPace for Xen and GPace’s Linux kernel module in 8,100 and ~15K lines of C, respectively. We imported 4,458 lines of AESNI assembly code from OpenSSL to encrypt packets in HyPace. We implemented ProfPace in 1,800 lines of Python and 1,200 lines of C. At each site in an application’s code where a message is sent to the network, we add 15 LoC to send a traffic indicator via IOCTL to the guest kernel. We identified and modified these sites manually; automating the instrumentation is possible but remains future work. No other changes were required to guest applications.

All experiments were performed on Dell PowerEdge R730 server machines with Intel Xeon E5-2667, 3.2 GHz, 16 core CPU (two sockets, 8 cores per socket), 512 GB RAM, and a Broadcom BCM 57800 10Gbps Ethernet card. The NIC was configured to export virtual NICs (vNICs). We disabled hyperthreading, dynamic voltage and frequency scaling, and power management in the hosts. We used Xen hypervisor 4.10.0 on the hosts, and the ‘Null’ scheduler [4] for VM scheduling.

We assigned 40GB RAM and one of the CPU sockets to Xen; up to two cores are configured to execute the HyPace transmit event handler in parallel; flows are partitioned among the cores. The guest runs in a VM with 8 cores and 64 GB RAM, and has access to a vNIC. The VCPUs of the guest VM were pinned one-to-one to cores on the second socket of the host CPU. This is in line with our threat model, which assumes that guests rent dedicated CPU sockets.

The guest runs an Ubuntu 16.04 LTS kernel (version 4.9.5, x86-64), and Apache HTTP Server 2.4. Network clients run Ubuntu 16.04 LTS without a hypervisor. The wiki application is based on Mediawiki 1.27.1, which serves dynamic web pages. It internally stores the wiki dataset in a database hosted locally on MySQL 5.7.16. We used a modified wrk2 [3] client to issue HTTPS GET requests for various pages to the Mediawiki server. We also wrote a custom video streaming server in PHP, which works like a simple file server serving video segments in response to client requests. The videos were hosted on an ext4 file system on a VM disk.

7.1 Spatial padding overhead

(a)
(b)
(c)
Figure 3: Padding overhead vs number of clusters and minimum cluster size (log-log scale) for (a) English Wiktionary, (b) English Wikipedia and (c) Video dataset

First, we measure the spatial padding overhead when clustering content as described in §6.2. This overhead corresponds to the network bandwidth overhead for Pacer’s traffic shaping. We clustered three different datasets: (i) a 2016 snapshot of the English Wiktionary corpus (5,027,344 documents, max 521.9kB, median 4.7kB), (ii) a 2008 snapshot of the English Wikipedia corpus (14,257,494 documents, max 14.3MB, median 83.5kB), and (iii) a set of videos downloaded from YouTube in March 2018 (1218 videos, max 468.7MB, median 6.2MB). Figure 3 shows the reduction in the average and maximum padding overhead with increasing number of clusters and decreasing minimum cluster size (i.e, the minimum number of objects in each cluster). Even coarse-grained clustering leads to a significant reduction in padding overhead.

7.2 Microbenchmarks

We empirically select a suitable epoch length , the maximum batch size (number of packets to be prepared by a HyPace handler) in each epoch, , and . To this end, we ran multiple, 12-hour experiments with varying network workloads. We requested 100KB sized documents from the Mediawiki web server using concurrent clients. As background workload, we ran large matrix multiplications on dom0. The workloads were configured to drive CPUs to near 100% utilization, and had a total working set of ~12GB.

To determine , and , we measured the cost of preparing batches of packets for transmission in HyPace Over many observations in the presence of the background load described above, we first determined the number of packets that can be safely prepared with different epoch lengths with a single HyPace handler. Within epochs of length 30s, 50s, 100s and 120s, the number of packets was 5, 14, 33, and 42, respectively, which allows HyPace to achieve 22%, 28%, 41% and 42% of the NIC line rate with a single core. We configured to be 120s for all HyPace handlers.

Based on these results, we use two parallel HyPace handlers running on separate cores. In this configuration, we repeated our measurements and chose packets and for each handler. is independent of the number of HyPace threads, and its average and maximum values observed across all experiment configurations were 3.9ms and 15.8ms, respectively. We conservatively configured to 20ms.

7.3 Video service

Next, we measure the impact of Pacer’s traffic shaping on a video streaming service. We implemented a streaming client in Python that simulates a MPEG-DASH player: when a user requests a video, the client initially fetches six segments (covering 5 seconds of video each) in succession to fill up a local buffer. After reaching 50% of the initial buffer (rebuffering goal), the player starts consuming the segments from the buffer. The client fetches subsequent segments sequentially whenever space is available in the buffer. With Pacer, the player does not request a segment until the transfer of the previous segment has finished, including any padding; otherwise, the timing of the client’s request would reveal the actual size of the previous segment. We measure the impact of traffic shaping on: (1) the initial delay until the video starts playing; (2) the frequency and duration of any pauses (video skipping) experienced by the player; (3) the download latency for individual video segments. The client sequentially plays four videos, randomly chosen among 1218 videos, for up to 5 minutes each. The videos were clustered into 19 clusters with at least 61 elements each. We ran experiments for a client with high bandwidth (10Gbps) and with low bandwidth (10Mbps).

Transmit schedules generated by ProfPace seek to download each segment as fast as the available bandwidth allows, which is how the baseline system works. With these schedules, there is no noticeable impact on the user experience for using Pacer. Initial startup delays don’t increase significantly, and there is no video skipping in any of the experiments. When serving 128 clients, the maximum CPU utilization increases from 3.73% to 6.26% with Pacer.

Pacer’s shaping also provides an opportunity to use domain knowledge to optimize schedules. We know that downloading the largest segment in our collection of 240p videos within its 5 second deadline requires 550kbps. Therefore, we can conservatively modify the inter-packet spacing in the schedules to 6ms, corresponding to a bandwidth of about 2Mbps. For clients with 10 Mbps bandwidth this optimization avoids losses and reduces the segment download latency significantly, but we omit the full results due to space constraints. This schedule optimization does not affect security; it only takes advantage of Pacer to reduce network contention, a known benefit of traffic shaping.

7.4 Mediawiki

Next, we measured Pacer’s impact on the throughput of Mediawiki serving the English Wiktionary corpus. Clients request different Wiktionary pages concurrently and synchronously for a period of 120s. Prior to the measurement, we run the workload for 10s to warm up the caches. We used a coarse-grained clustering of the Wiktionary corpus with one cluster of 5,010,856 files up to 12KB in size and another with the remaining 16,488 files, which yields an average padding overhead of 150%. We requested the largest file from each cluster several times with varying number of concurrent clients. Additionally, we ran a trace where clients request 1,000 random files from both clusters.

Figure 4

shows the throughput vs average latency for the baseline and Pacer. The error bars show the standard deviations of the average latencies. We used the request trace workload (

and correspond to baseline and Pacer respectively) and, for comparison, we also stressed the server with requests only to the largest file in the corpus (521.9KB) ( and for baseline and Pacer).

Unlike the baseline, Pacer’s latency remains constant until the maximal throughput, because latency is determined by the transmission schedule. Once the server is at capacity, it fails to serve additional requests and clients time out. Pacer’s latencies are higher than the baseline’s because the profiler generates conservative schedules based on all samples observed during profiling. Nevertheless, the latencies remain within hundreds of milliseconds, and could be optimized substantially using different schedules for different load conditions.

Pacer incurs a 6.8% and 30% overhead on peak throughput for the trace workload and the large file, respectively. With the large file, the baseline operates at over 40% of the line rate, and we believe that Pacer’s performance in this challenging experiment is limited by the accuracy of transmit schedules, which can be improved substantially.

Figure 4: Mediawiki xput vs latency

8 Related work

Attacks using network side channels. Network side channel attacks can be launched by observing the total number and sizes of packets [16, 58, 22, 15], their timing [14, 27], and more coarse-grained information, such as burst lengths, the frequency of bursts, burst volumes [23, 66, 64], and a combination of such features [28, 40, 55]. Attacks have been shown to discover what a user is typing over SSH [57], which websites a user is visiting [29, 63], what videos are being streamed [54], the contents of live conversations [66] and private keys [13, 12], even with end-to-end encryption and techniques like onion routing in place [48].

Within the context of Cloud computing, the setting of this paper, Ristenpart et al. [53] and Inci et al. [30, 31] show that targeted attacks can be carried out by first attaining co-residency with a desired victim, and then exploiting side channels, including contention on I/O ports [53, Section 8.3]. Agarwal et al. [6] first demonstrated the type of attack we studied in a more general form in §2.

Mitigating network side channels. Dependence of packet size on secrets can be eliminated by padding all packets to a fixed length [29, 66]. This standard technique is also used by Pacer. Dependence of packet timing on secrets can be eliminated by sending packets at a fixed rate independent of the actual workload, inserting dummy packets when no actual packets exist [55, 57]. However, this either wastes bandwidth or incurs high latencies when the workload is bursty.

A more efficient strategy is predictive mitigation [8, 72], where a packet transmission schedule is built ahead of time, independent of any secrets. At each scheduled transmission, the enforcement mechanism transmits a packet if the application has provided one, else nothing is transmitted and a 1-bit leak is incurred. At every leak, the schedule is adapted using a prediction algorithm (based only on public inputs) to reduce the chances of a future leak. Pacer improves this idea to eliminate timing leaks completely (by sending a dummy packet when the application does not provide a packet on time), and shows that the idea works in a practical system. On the flip side, prior work considers adversaries who may compromise clients, while Pacer assumes that clients are trusted.

Traffic morphing [67] and related work like Walkie-Talkie [65] shape traffic to make sensitive responses similar (ideally identical) to non-sensitive responses. Unlike Pacer, such techniques do not provide strong security—they only introduce some degree of uncertainty for the adversary.

BuFLO [23] proposes to shape traffic to evenly-spaced bursts of a fixed number of packets for a certain minimum amount of time after a request starts. Our pacing strategy is a significant generalization of BuFLO, where the shape may be different for every partition of requests. Cai et al. [14] propose a simple extension of BuFLO to make pacing sensitive to congestion signals, much like Pacer. BuFLO and its extension have not been integrated into any real system.

In constant-time cryptography [20, 18, 19] the victim’s code is modified either manually or with a compiler transformation to make the timing of outputs independent of secrets. This approach has been applied widely to secure small pieces of code like cryptographic libraries but has not been shown to scale to large software so far.

Mitigating network side channels in Clouds. Contention on NICs in a Cloud can be mitigated by time-division multiple access (TDMA) in a hypervisor [34] as this eliminates the adversary VM’s ability to observe a co-located victim’s traffic. However, this approach is inherently inefficient when the payload traffic is bursty. Statistical multiplexing, which caps the amount of data transmitted by a VM in an epoch, is fundamentally insecure as it allows side-channel attacks (§2).

Another defense is to restrict the adversary VM’s ability to observe time [62, 44, 42]. StopWatch [39] replaces a VM’s clock with virtual time based only on that VM’s execution. To mitigate network side channels, each VM is replicated , the replicas are co-located with different guests, and each interrupt is delivered at a virtual time that is the median of the 3 times. This prevents a guest from consistently observing I/O interference with any co-located tenant. However, it also requires a increase in employed Cloud resources. In contrast, Pacer mitigates network side-channel leaks with far less resource overhead. Deterland [68] also replaces VMs’ real time with virtual time, but it does not address leaks due to network side channels as it delivers I/O events to VMs without delay.

Bilal et al. [10] address leaks via the pattern of queries to different backend nodes in multi-tier stream-processing applications in a Cloud, but they do not consider leaks due to packet size and timing, which is what Pacer focuses on.

Metadata-privacy. Herd [38],  Vuvuzela [60], and Karaoke [37] provide metadata-privacy—they prevent information about who is communicating to whom from leaking into network side channels. This goal is fundamentally different from Pacer’s goal of preventing server-side secrets from leaking. However, these systems and Pacer share underlying techniques such as the use of dummy packets to shape traffic.

Other work. Oblivious computing [21, 24, 43] prevents accessed memory addresses or accessed database keys from depending on secrets. Pacer addresses the orthogonal problem of making packet size and timing independent of secrets.

Some prior work [51, 45, 17] uses techniques similar to those used for network side channel mitigation to performance isolate co-located tenants. Richter et al. [52] propose to performance-isolate co-located tenants by modifying the NIC firmware. Pacer’s traffic shaping can be similarly implemented in the NIC. This would provide strong isolation of the pacing logic from the rest of the system in the face of micro-architectural side channels. Silo [33] implements traffic pacing in the hypervisor like Pacer. However, Silo’s goal is to improve remote access latency, not information security, and, hence, its pacing logic is very different.

9 Conclusions

We presented and evaluated Pacer, a traffic-shaping tunnel implementation for an IaaS Cloud. Pacer shapes tenant’s network traffic to be independent of secrets, therefore thwarting network side channels attacks by adversaries who can achieve co-location in the same server, rack, or datacenter, or otherwise observe victim’s traffic. Pacer reduces overhead by allowing the tenant’s traffic shape to reveal non-secret aspects of its workload. Experiments with video and document servers show that Pacer can achieve a bandwidth cost that reflects the tenant’s conditional traffic distribution given public parameters and generally modest runtime overhead. Pacer’s design can be generalized to cover client traffic, multi-tier systems, and general VPNs, but the details remain as future work.

References

  • [1] Azure confidential computing. https://azure.microsoft.com/en-us/solutions/confidential-compute/.
  • [2] NapaTech SmartNIC, Feature Overview Data Sheet. https://www.napatech.com/support/resources/data-sheets/napatech-smartnic-feature-overview/.
  • [3] wrk2: A constant throughput, correct latency recording variant of wrk. https://github.com/giltene/wrk2.
  • [4] Xen Null scheduler. https://patchwork.kernel.org/patch/9669405/.
  • [5] Onur Acıiçmez, Çetin Kaya Koç, and Jean-Pierre Seifert. Predicting secret keys via branch prediction. In Cryptographers’ Track at the RSA Conference, 2007.
  • [6] Yatharth Agarwal, Vishnu Murale, Jason Hennessey, Kyle Hogan, and Mayank Varia. Moving in next door: Network flooding as a side channel in cloud environments. In Intl. Conf. on Cryptology and Network Security (CANS), 2016.
  • [7] Gorka Irazoqui Apecechea, Mehmet Sinan Inci, Thomas Eisenbarth, and Berk Sunar. Fine grain Cross-VM Attacks on Xen and VMware are possible! In IEEE Intl. Conf. on Big Data and Cloud Computing (BDCLOUD), 2014.
  • [8] Aslan Askarov, Danfeng Zhang, and Andrew C Myers. Predictive black-box mitigation of timing channels. In ACM Conf. on Computer and Communications Security (CCS), 2010.
  • [9] Daniel J Bernstein. Cache-timing attacks on AES, 2005.
  • [10] Muhammad Bilal, Hassan Alsibyani, and Marco Canini. Mitigating network side channel leakage for stream processing systems in trusted execution environments. In ACM Intl. Conf. on Distributed and Event-based Systems (DEBS), 2018.
  • [11] Benjamin A Braun, Suman Jana, and Dan Boneh. Robust and efficient elimination of cache and timing side channels. arXiv preprint arXiv:1506.00189, 2015.
  • [12] Billy Bob Brumley and Nicola Tuveri. Remote timing attacks are still practical. In European Symposium on Research in Computer Security (ESORICS), 2011.
  • [13] David Brumley and Dan Boneh. Remote timing attacks are practical. Computer Networks, 48(5), 2005.
  • [14] Xiang Cai, Xin Cheng Zhang, Brijesh Joshi, and Rob Johnson. Touching from a distance: Website fingerprinting attacks and defenses. In ACM Conf. on Computer and Communications Security (CCS), 2012.
  • [15] Shuo Chen, Rui Wang, XiaoFeng Wang, and Kehuan Zhang. Side-channel leaks in web applications: A reality today, a challenge tomorrow. In IEEE Symposium on Security and Privacy (SP), 2010.
  • [16] Heyning Cheng and Ron Avnur. Traffic analysis of ssl encrypted web browsing, 1998.
  • [17] Ron Chi-Lung Chiang, Sundaresan Rajasekaran, Nan Zhang, and H. Howie Huang. Swiper: Exploiting virtual machine vulnerability in third-party clouds with competition for I/O resources. IEEE Trans. on Parallel and Distributed Systems (TPDS), 26(6), 2015.
  • [18] Jeroen V. Cleemput, Bart Coppens, and Bjorn De Sutter. Compiler mitigations for time attacks on modern x86 processors. ACM Trans. on Architecture and Code Optimizations (TACO), 8(4), 2012.
  • [19] Jeroen V. Cleemput, Bjorn De Sutter, and Koen De Bosschere. Adaptive compiler strategies for mitigating timing side channel attacks. IEEE Trans. on Dependable and Secure Computing (TDSC), PP(99), July 2017.
  • [20] Bart Coppens, Ingrid Verbauwhede, Koen De Bosschere, and Bjorn De Sutter. Practical mitigations for timing-based side-channel attacks on modern x86 processors. In IEEE Symposium on Security and Privacy (SP), 2009.
  • [21] Natacha Crooks, Matthew Burke, Ethan Cecchetti, Sitar Harel, Rachit Agarwal, and Lorenzo Alvisi. Obladi: Oblivious serializable transactions in the cloud. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018.
  • [22] George Danezis. Traffic analysis of the http protocol over tls, 2009.
  • [23] Kevin P Dyer, Scott E Coull, Thomas Ristenpart, and Thomas Shrimpton. Peek-a-boo, i still see you: Why efficient traffic analysis countermeasures fail. In IEEE Symposium on Security and Privacy (SP), 2012.
  • [24] Saba Eskandarian and Matei Zaharia. An oblivious general-purpose SQL database for the cloud. CoRR, abs/1710.00458, 2017.
  • [25] Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh. Jump over ASLR: Attacking branch predictors to bypass ASLR. In IEEE/ACM Intl. Symposium on Microarchitecture (MICRO), 2016.
  • [26] Qian Ge, Yuval Yarom, David Cock, and Gernot Heiser. A survey of microarchitectural timing attacks and countermeasures on contemporary hardware. Journal of Cryptographic Engineering, 2016.
  • [27] Xun Gong, Nikita Borisov, Negar Kiyavash, and Nabil Schear. Website detection using remote traffic analysis. In Privacy Enhancing Technologies Symposium (PETS), 2012.
  • [28] Jamie Hayes and George Danezis. k-fingerprinting: A robust scalable website fingerprinting technique. In USENIX Security Symposium, 2016.
  • [29] Andrew Hintz. Fingerprinting websites using traffic analysis. In Conf. on Privacy Enhancing Technologies (PETS), 2002.
  • [30] Mehmet Sinan Inci, Berk Gülmezoglu, Gorka Irazoqui Apecechea, Thomas Eisenbarth, and Berk Sunar. Seriously, get off my cloud! cross-vm rsa key recovery in a public cloud. IACR Cryptology ePrint Archive, 2015(1-15), 2015.
  • [31] Mehmet Sinan İnci, Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. Efficient, adversarial neighbor discovery using logical channels on microsoft azure. In Annual Conf. on Computer Security Applications (ACSAC), 2016.
  • [32] Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. S$A: A Shared Cache Attack That Works across Cores and Defies VM Sandboxing–and Its Application to AES. In IEEE Symposium on Security and Privacy (SP), 2015.
  • [33] Keon Jang, Justine Sherry, Hitesh Ballani, and Toby Moncaster. Silo: Predictable message latency in the cloud. In ACM Conf. on Special Interest Group on Data Communication (SIGCOMM), 2015.
  • [34] Sachin Kadloor, Negar Kiyavash, and Parv Venkitasubramaniam. Mitigating timing side channel in shared schedulers. IEEE/ACM Trans. on Networking (TON), 24(3), 2016.
  • [35] Paul Kocher. Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In Advances in Cryptology – CRYPTO, 1996.
  • [36] Adam Langley, Alistair Riddoch, Alyssa Wilk, Antonio Vicente, Charles Krasic, Dan Zhang, Fan Yang, Fedor Kouranov, Ian Swett, Janardhan Iyengar, et al. The quic transport protocol: Design and internet-scale deployment. In ACM Conf. on Special Interest Group on Data Communication (SIGCOMM), 2017.
  • [37] David Lazar, Yossi Gilad, and Nickolai Zeldovich. Karaoke: Distributed private messaging immune to passive traffic analysis. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018.
  • [38] Stevens Le Blond, David Choffnes, William Caldwell, Peter Druschel, and Nicholas Merritt. Herd: A scalable, traffic analysis resistant anonymity network for VoIP systems. In ACM Conf. on Special Interest Group on Data Communication (SIGCOMM), 2015.
  • [39] Peng Li, Debin Gao, and Michael K Reiter. Stopwatch: a cloud architecture for timing channel mitigation. ACM Trans. on Information and System Security (TISSEC), 17(2), 2014.
  • [40] Shuai Li, Huajun Guo, and Nicholas Hopper. Measuring information leakage in website fingerprinting attacks and defenses. In ACM Conf. on Computer and Communications Security (CCS), 2018.
  • [41] Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B Lee. Last-level cache side-channel attacks are practical. In IEEE Symposium on Security and Privacy (SP), 2015.
  • [42] Weijie Liu, Debin Gao, and Michael K Reiter. On-demand time blurring to support side-channel defense. In European Symposium on Research in Computer Security (ESORICS), 2017.
  • [43] Jacob R Lorch, Bryan Parno, James Mickens, Mariana Raykova, and Joshua Schiffman. Shroud: Ensuring private access to large-scale data in the data center. In USENIX Conference on File and Storage Technologies (FAST), 2013.
  • [44] Robert Martin, John Demme, and Simha Sethumadhavan. Timewarp: Rethinking timekeeping and performance monitoring mechanisms to mitigate side-channel attacks. In Intl. Symposium on Computer Architecture (ISCA), 2012.
  • [45] Diego Ongaro, Alan L Cox, and Scott Rixner. Scheduling I/O in virtual machine monitors. In ACM SIGPLAN/SIGOPS Intl. Conf. on Virtual Execution Environments (VEE), 2008.
  • [46] Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache Attacks and Countermeasures: The Case of AES. In The Cryptographers’ Track at the RSA Conf. on Topics in Cryptology (CT-RSA), 2006.
  • [47] Dan Page. Theoretical use of cache memory as a cryptanalytic side-channel. http://www.cs.bris.ac.uk/Publications/pub_info.jsp?id=1000625, 2002.
  • [48] Andriy Panchenko, Lukas Niessen, Andreas Zinnen, and Thomas Engel. Website fingerprinting in onion routing based anonymization networks. In ACM Workshop on Privacy in the Electronic Society (WPES), 2011.
  • [49] Colin Percival. Cache missing for fun and profit, 2005.
  • [50] Peter Pessl, Daniel Gruss, Clementine Maurice, Michael Schwarz, and Stefan Mangard. DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks. In USENIX Security Symposium, 2016.
  • [51] Xing Pu, Ling Liu, Yiduo Mei, Sankaran Sivathanu, Younggyun Koh, Calton Pu, and Yuanda Cao. Who Is Your Neighbor: Net I/O Performance Interference in Virtualized Clouds. IEEE Trans. on Services Computing, 6(3), 2013.
  • [52] Andre Richter, Christian Herber, Stefan Wallentowitz, Thomas Wild, and Andreas Herkersdorf. A Hardware/Software Approach for Mitigating Performance Interference Effects in Virtualized Environments Using SR-IOV. In IEEE Intl. Conf. on Cloud Computing (CLOUD), 2015.
  • [53] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. Hey, You, Get off of My Cloud: Exploring Information Leakage in Third-party Compute Clouds. In ACM Conf. on Computer and Communications Security (CCS), 2009.
  • [54] Eran Tromer Roei Schuster, Vitaly Shmatikov. Beauty and the Burst: Remote Identification of Encrypted Video Streams. In USENIX Security Symposium, 2017.
  • [55] T Scott Saponas, Jonathan Lester, Carl Hartung, Sameer Agarwal, Tadayoshi Kohno, et al. Devices that tell on you: Privacy trends in consumer ubiquitous computing. In USENIX Security Symposium, 2007.
  • [56] Michael Schwarz, Martin Schwarzl, Moritz Lipp, and Daniel Gruss. Netspectre: Read arbitrary memory over network. CoRR, abs/1807.10535, 2018.
  • [57] Dawn Xiaodong Song, David Wagner, and Xuqing Tian. Timing analysis of keystrokes and timing attacks on ssh. In USENIX Security Symposium, 2001.
  • [58] Qixiang Sun, Daniel R. Simon, Yi-Min Wang, Wilf Russell, Venkata N. Padmanabhan, and Lili Qiu. Statistical identification of encrypted web browsing traffic. In IEEE Symposium on Security and Privacy (SP), 2002.
  • [59] Leif Uhsadel, Andy Georges, and Ingrid Verbauwhede. Exploiting hardware performance counters. In Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC), 2008.
  • [60] Jelle Van Den Hooff, David Lazar, Matei Zaharia, and Nickolai Zeldovich. Vuvuzela: Scalable private messaging resistant to traffic analysis. In Symposium on Operating Systems Principles (SOSP), 2015.
  • [61] Venkatanathan Varadarajan, Thomas Ristenpart, and Michael M Swift. Scheduler-based defenses against cross-vm side-channels. In USENIX Security Symposium, 2014.
  • [62] Bhanu C Vattikonda, Sambit Das, and Hovav Shacham. Eliminating fine grained timers in xen. In ACM workshop on Cloud Computing Security Workshop, 2011.
  • [63] Pepe Vila and Boris Köpf. Loophole: Timing attacks on shared event loops in chrome. In USENIX Security Symposium, 2017.
  • [64] Tao Wang, Xiang Cai, Rishab Nithyanand, Rob Johnson, and Ian Goldberg. Effective attacks and provable defenses for website fingerprinting. In USENIX Security Symposium, pages 143–157, 2014.
  • [65] Tao Wang and Ian Goldberg. Walkie-talkie: An efficient defense against passive website fingerprinting attacks. In USENIX Security Symposium, 2017.
  • [66] Charles V Wright, Lucas Ballard, Scott E Coull, Fabian Monrose, and Gerald M Masson. Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations. In IEEE Symposium on Security and Privacy (SP), 2008.
  • [67] Charles V. Wright, Scott E. Coull, and Fabian Monrose. Traffic morphing: An efficient defense against statistical traffic analysis. In Network and Distributed System Security Symposium (NDSS), 2009.
  • [68] Weiyi Wu and Bryan Ford. Deterministically deterring timing attacks in deterland. arXiv preprint arXiv:1504.07070, 2015.
  • [69] Yuanzhong Xu, Weidong Cui, and Marcus Peinado. Controlled-channel attacks: Deterministic side channels for untrusted operating systems. In IEEE Symposium on Security and Privacy (SP), 2015.
  • [70] Yuval Yarom and Katrina Falkner. FLUSH+RELOAD: A High Resolution, Low Noise, L3 Cache Side-Channel Attack. In USENIX Security Symposium, 2014.
  • [71] Yuval Yarom, Daniel Genkin, and Nadia Heninger. CacheBleed: a timing attack on OpenSSL constant-time RSA. Journal of Cryptographic Engineering, 7(2), 2017.
  • [72] Danfeng Zhang, Aslan Askarov, and Andrew C Myers. Predictive Mitigation of Timing Channels in Interactive Systems. In ACM Conf. on Computer and Communications Security (CCS), 2011.
  • [73] Yinqian Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. Cross-VM side channels and their use to extract private keys. In ACM Conf. on Computer and Communications Security (CCS), 2012.