DeepAI
Log In Sign Up

Sonic: A Sampling-based Online Controller for Streaming Applications

Many applications in important problem domains such as machine learning and computer vision are streaming applications that take a sequence of inputs over time. It is challenging to find knob settings that optimize the run-time performance of such applications because the optimal knob settings are usually functions of inputs, computing platforms, time as well as user's requirements, which can be very diverse. Most prior works address this problem by offline profiling followed by training models for control. However, profiling-based approaches incur large overhead before execution; it is also difficult to redeploy them in other run-time configurations. In this paper, we propose Sonic, a sampling-based online controller for long-running streaming applications that does not require profiling ahead of time. Within each phase of a streaming application's execution, Sonic utilizes the beginning portion to sample the knob space strategically and aims to pick the optimal knob setting for the rest of the phase, given a user-specified constrained optimization problem. A hybrid approach of machine learning regressions and Bayesian optimization are used for better overall sampling choices. Sonic is implemented independent of application, device, input, performance objective and constraints. We evaluate Sonic on traditional parallel benchmarks as well as on deep learning inference benchmarks across multiple platforms. Our experiments show that when using Sonic to control knob settings, application run-time performance is only 5.3 demonstrating that Sonic is able to find near-optimal knob settings under diverse run-time configurations without prior knowledge quickly.

READ FULL TEXT VIEW PDF

page 3

page 9

11/16/2017

A Design-Time/Run-Time Application Mapping Methodology for Predictable Execution Time in MPSoCs

Executing multiple applications on a single MPSoC brings the major chall...
03/23/2022

Out-of-Core Edge Partitioning at Linear Run-Time

Graph edge partitioning is an important preprocessing step to optimize d...
06/07/2019

Towards Run Time Estimation of the Gaussian Chemistry Code for SEAGrid Science Gateway

Accurate estimation of the run time of computational codes has a number ...
07/30/2018

Comparison of Production Serverless Function Orchestration Systems

Since the appearance of Amazon Lambda in 2014, all major cloud providers...
04/13/2021

Balboa: Bobbing and Weaving around Network Censorship

We introduce Balboa, a link obfuscation framework for censorship circumv...
07/11/2022

Orchestrating Tool Chains for Model-based Systems Engineering with RCE

When using multiple software tools to analyze, visualize, or optimize mo...

1. Introduction

Streaming applications are ubiquitous in important domains such as deep learning, computer vision and media processing. Those applications are typically long-running applications with inputs supplied incrementally over time, assuming that the run-time behavior of the application for successive inputs is stationary. For instance, augmented reality (AR) applications take in a series of frames to enhance real-world scenes by computer-generated perceptual information. Optimizing streaming application’s run-time performance is important but challenging because it depends on multiple factors including inputs, computing devices, time as well as user’s requirements.

Most streaming applications have a set of tunable parameters, usually referred to as knobs, that can be adjusted to optimize their performance; as long as knob values are within certain limits, the correctness of the application is not compromised. Many streaming applications are executed on parallel platforms, so their performance is also affected by characteristics of the device such as the number of cores allocated and their clock frequency. Finding the optimal device knob setting is challenging because modern compute devices have more and more non-trivial parallelism.

In addition, applications are usually optimized subject to various constraints such as computation accuracy, power/energy consumption, etc. For instance, energy consumption is an important factor in lowering the operation cost of data centers as well as lengthen battery life for resource-constrained devices. Given different combinations of performance objectives and constraints, the system needs to find different optimal knob settings accordingly.

Furthermore, application’s performance is also sensitive to inputs. For instance, both image size and image content may affect image/video related tasks’ performance. Unlike problems such as n-body simulation (aarseth2003gravitational) and graph partitioner (bader2013graph) whose inputs are available at the beginning of the computation for inspection, streaming application’s inputs are usually not known before execution and are supplied over time.

The cross-product of applications, devices, inputs, performance objective and constraints makes it very challenging to decipher the optimal application and device knob settings of any run-time configuration. Prior works usually address such problem through offline executions of a set of run-time configurations, or offline profiling, followed by constructing models to predict application’s run-time behavior (capri; capri_sw; poet; neuralvector; thunderx_beacon; rumba; green; caloree; hbm; gmm). Models are then used to perform offline control or reactive online control. For example, Capri builds proxy models for both quality and running time of a program through offline profiling. These models together with input features are then used to perform proactive control of approximation (capri); CALOREE

performs online control using estimated performance-energy pareto frontier of one application by matching its behavior with models that have been already profiled on the same machine 

(caloree).

However, there are several drawbacks for such profiling-based approaches.

  • Overhead. Profiling incurs large overhead, including compilation, data collection (execution), model training and so on. On embedded platforms, profiling takes long time due to limited computation resource while it can be very costly to get profiling data on larger servers. Profiling even becomes inapplicable when single execution of an application is expensive, e.g. chemical material design (griffiths2017constrained).

  • Scope. Due to the overhead of profiling, only a specific subset of the whole run-time configuration space can be profiled. For example, prior works usually profile on a small subset of inputs and assume it is representative (capri; rumba); Capri (capri) and Rumba (rumba) focus on optimizing performance under accuracy constraint while POET (poet) and CALOREE (caloree) only target on minimizing energy given a performance requirement. Users may be interested in other reasonable constrained optimization problems such as “Least resource allocation given a performance requirement” on super computers.

  • Portability. Models built on profiling data are difficult to be ported to other unseen run-time configurations. For instance, the relationship between application’s performance and power on one device may not hold on new device with distinct hardware architectures.

While profiling-based approaches tries to anticipate application’s run-time behavior through offline data, we propose to solve such control problem from a pure online control perspective, in which the optimal knob setting is searched during run time, given the nature of streaming applications that the run-time behavior for successive inputs is stationary.

Due to complex run-time configuration space, we formulate the control problem as a global optimization process that searches the optimal knob setting by sampling, rather than a traditional feedback control loop that adjusts knob settings by comparing quantities of interest with reference values. In this paper, we propose Sonic, a sampling-based online controller that does not require profiling ahead of time. Sonic consists of two key components: a sampler and a phase detector.

Sampler.

Within each phase of a streaming application’s execution, the sampler utilizes the beginning portion of that phase to sample the knob space and collect the statistics of each sampled knob setting. After the sampling phase, the controller picks the best knob setting among the sampled ones based on the performance objective & constraint of the current run-time configuration. The picked knob setting will then be used for the following execution. Since it is challenging to capture and optimize application’s run-time behavior with only limited samples of knob settings, we incorporate combinations of machine learning regressors and sequential design strategy for global optimization such as Bayesian optimization (bo; bo_tutorial; bo_review) for better sample choices.

Phase Detector.

Streaming application may have distinct phases during one run due to various reasons such as algorithmic phases, launch of another program on the same device, inputs change, etc. In Sonic, a phase detector is activated after the sampling period and monitors the difference between application’s current behavior with the recorded behavior of the picked knob setting during the sampling phase. If the difference is large, a new sampling period will be activated. In this paper, we assume that each phase is long enough for the sampling-based online control strategy to be beneficial.

We evaluate Sonic on the PARSEC parallel benchmark suite (parsec) and a modified MLPerf inference benchmark suite (mlperf) across multiple platforms. The primary contributions of this paper are listed as follows:

  • To the best of our knowledge, Sonic is the first sampling-based online controller targeting general streaming applications without dependencies on offline profiling.

  • Sonic incorporates a hybrid control approach that consists of sequential design strategies and machine learning regressors to improve the choice of samples.

  • Sonic is implemneted independent of applications, devices, inputs, objectives and constraints, so it can be easily ported to different run-time configurations. It is also orthogonal to other types of optimizations such as algorithmic improvement, new accelerator and so on.

  • Sonic is easy to use. The only extra code needed for application and device is an interface to report their performance at run time. No need for algorithmic changes such as annotating code, enabling re-computation (green; rumba), etc.

  • By evaluating on traditional and deep learning benchmarks, our experiments show that when using Sonic, application performance is only 5.3% less than if optimal knob settings were used, showing that Sonic is able to find near-optimal knob settings under diverse run-time configurations.

Sonic can be very useful in situations when run-time configurations changes frequently. For example, devices may be unknown and undeterministic when using cloud service; user changes input, performance objective/constraint frequently.

2. Background and Motivation

Streaming applications takes a sequence of inputs such as an audio/image sequence, a series of transformations, etc. These inputs are usually not available before application starts, and will be supplied over time. In this paper, we will focus on streaming applications that run on CPUs due to CPU’s advantages such as flexibility and availability over other devices such as GPUS, ASICs, etc.

2.1. Baseline Performance

In the default run-time setting, application’s knobs are set to a default value, chosen by the programmers during implementation. The race-to-idle strategy is applied by default on devices, so all the computing resource will be allocated with their highest operation condition. We use the term DEFAULT to refer to this default application and device knob setting.

The most common performance metric of a streaming application is the rate of processing inputs, or FPS. We demonstrate how DEFAULT

performs by inferencing various deep neural network models on a dual-socket desktop workstation with 64 cores. All the models are implemented within

TensorFlow (tensorflow). For simplicity, all the application knobs are fixed and only the number of cores to which an application is deployed is controlled in this experiment.

Since DEFAULT may not yield the best performance, we use term ORACLE to refer to the optimal knob setting given by exhaustive profiling. The comparison between DEFAULT and ORACLE as well as the optimal number of cores to use are shown in Table 1. While DEFAULT

should yield the best performance in many cases, surprisingly, no model has the best performance when uses all the 64 cores in this experiment. There is a geometric mean performance loss of 40% when using

DEFAULT, and different models have different ORACLE settings. The primary cause is that the communication overhead between cores grows with the number of cores being used. The ORACLE setting achieves better balance between communication and the amount of work allocated to each core. Also worth to notice that using DEFAULT results in unnecessary recourse occupation. Though DEFAULT is easy to use, this experiment shows that DEFAULT may underutilize modern computer systems that have more and more non-trivial parallelism.

Applications DEFAULT ORACLE Cores Speedup
ResNet8 1409.01 1769.18 4 1.25x
ResNet50 53.46 60.88 46 1.14x
MobileNet_V2 124.57 139.02 15 1.12x
Visual wake words 245.11 267.25 4 1.09x
Speech recognition 2.06 4.26 2 2.07x
Text classification 124.92 257.85 7 2.06x
Table 1. Model inference performance comparison between DEFAULT and ORACLE on a dual-socket 64-core desktop workstation with TensorFlow.

This experiment involves optimizing different applications on one device without imposing constraints. It becomes more difficult to find the optimal knob setting when having more dimensions of complexity: devices, inputs, user-specified performance objective and constraints (Section 2.2 - Section 2.4).

2.2. Device Diversity

We demonstrate device diversity by running the same application on two popular embedded platforms: i) Odroid XU4, ii) Jetson TX2. Both devices are heterogeneous platforms with two types of cores. The application used is Vips, a parallel image transformer for large uncompressed images.

Figure 1 shows the normalized performance of Vips on two platforms when using different core combinations. Vips’s performance is neither linear nor convex, and has very different patterns between two boards. First of all, neither boards yield the best performance when using DEFAULT. Second, the optimal knob setting is three big cores and four little cores on Odroid XU4 while two Denver cores and two Arm cores on Jetson TX2. It is not trivial to obtain the optimal knob setting due to the complexity in application’s implementation (e.g. load balancing, data exchange & sharing) and device’s architecture (e.g. compute capacity of/between cores).

2.3. Input Sensitivity

The next layer of complexity comes from inputs. The most straightforward factor is input size. Video encoder X264 encodes 720p videos 4 faster than 1080p videos. More unnoticeable factors involve input content. As shown in Figure 2, though both videos are of 720p, encoding rendered content (14.3 FPS) is almost 2 faster than encoding photographic content (8.3 FPS) on an Odroid XU4 board. The potential reason is that rendered content is relatively ‘easier’ because it is generated with algorithms. Input sensitivity of such leads to distinct optimal knob settings when having a fixed performance requirement, e.g. minimum FPS requirement.

(a) Odroid XU4
(b) Jetson TX2
Figure 1. Performance of Vips on Odroid XU4 and Jetson TX2 with different combination of cores.
(a) Rendered content
(b) Photographic content
Figure 2. Example of content difference of X264 inputs.

2.4. Objectives and Constraints

(a) FPS Vs. Power
(b) FPS Vs. Energy
Figure 3. Example pareto frontiers of X264 on Odroid XU4. Each point represents one knob setting.

User-specified run-time objective and constraint also affect the optimal knob setting. An objective is defined as a metric to optimize; a constraint is represented by a metric and a set point. Meeting a constraint means the metric value is lower ( or higher) than the set point per user request. Note that constraint is optional and there can be more than one constraints.

In Figure 3, we display pareto frontiers of two constrained optimization problems of X264 running on Odroid XU4. It is clear that given different constraint metrics, their pareto frontiers are significantly unalike; given one constraint such as power, different set points result in distinct knob settings on the pareto frontier. It also worth to notice that there are plenty of “good enough” knob settings near the pareto frontier that yield similar performance.

Most prior works (capri; caloree) focus on solving only one type of objective and constraint combination, such as “Optimize performance given an accuracy bound” or “Least energy under a performance cap”. However, objective and constraint can be any meaningful combinations depending on actual use cases. Other common (constrained) optimization problems includes “Maximize FPS”, “Maximize FPS under a power/energy cap”, ”Minimize resource usage given an FPS requirement”, etc.

2.5. Summary

In addition to the factors mentioned above, there can be other factors that have non-trivial run-time impact, such as extreme core temperature (self-protection mechanism), running multiple streaming applications. Those situations make the run-time behavior even more unpredictable.

In this section, we show that:

  • In optimization problems (without constraint), DEFAULT may deliver sub-optimal performance and cause unnecessary resource occupation.

  • The cross product of all the factors makes it difficult if not impossible to derive closed-form analytical expressions for various metrics. Approaches that rely on offline profiling are not feasible (overhead, scope, portability) for solving such control problem.

The goal of this paper is to explore the idea of online control and develop a generic methodology to control streaming applications at run time in a principled fashion so that the objective is optimized while the constraint is also met.

3. Problem Formulation

In this section, we describe the formulation of the constrained optimization problem we target to solve. We target general streaming applications but will focus on those long-running ones. Applications might have distinct phases with different optimal knob solutions, but the run-time behavior for successive inputs within one phase is relatively stationary. We assume that each phase is long enough ( 1 minute) so that the knob setting provided by the sampling-based online controller can be beneficial.

To keep the notation simple, since a minimization/maximization problem can be easily converted to its opposite problem by multiplying a “-1”, we assume that the objective metric is maximized and the constraint metric is kept under a set point for demonstration purpose in this paper. We also assume that only one constraint is used, and this can be easily extended to using no constraint or more than one constraints.

Informally, the target control problem can be formulated as the following: for each experiment, a user specifies a run-time configuration, including an application, its input(s), a device, a objective and a constraint. Within each phase of the application’s execution, find the knob setting at run time in the combined knob space of application and device, so that the objective is maximized and the constraint is met. This can be formulated as the following constrained optimization problem.

Problem Formulation 1 ().

Given a run-time configuration,

  • A streaming application with knob space

  • A device with knob space

  • An objective metric

  • An constraint metric and set point

  • An input

Within each phase of the execution, find such that

  • is maximized

4. Sampling-based Online Controller

4.1. Choice of Online Control Strategy

4.1.1. Traditional Online Control

A traditional online control system can be represented as a supervised feedback loop where the system monitors certain quantities of interest and compares them to their reference values. Base on the difference, the controller tunes knob settings in the direction of reducing the gap. Cruise control is a typical online control system that adjusts acceleration by looking at the difference between vehicle’s current speed and a reference speed. Traditional control systems have already been used to adjust knob settings in computer systems (green; rumba; mimo; spectr; ma2011scalable; fu2011cache; slambooster2).

Though traditional methods have the advantage of formal reasoning (ctrl_handbook), a function for updating knob settings based on the quantity gap is needed. These functions typically rely on strong structural assumptions of being linear, convex or deterministic. However, as discussed in Section 2.2, the relationship between knob settings and performance can be nonlinear or non-convex, with unknown function form. Plus the diversity in run-time configurations, it is difficult to come up with a general tuning function. Additionally, traditional online control requires a reference value to react.

4.1.2. Sampling-based Method

Sampling is a popular alternative approach when analytic methods aren’t options. Instead of depending on gaps towards reference values, sampling is an unsupervised approach to pick knob settings only based on the performance patterns of the already sampled points. During one sampling phase, a sequence of knob settings are picked incrementally and strategically, of which each knob setting is decided based on the statistics of all previous sampled points. Unlike traditional control loop, sampling-based approach does not require to know the behavior pattern of the whole knob spaces ahead of time in order to pick a knob setting, so that it can adapt to any run-time configuration. Therefore in this paper, sampling approach is chosen as our basic online control strategy.

4.2. Controller Overview

The control flow of Sonic is presented Algorithm 1. The main idea of Sonic is to utilize the beginning portion of each phase for sampling, aiming to find the optimal knob setting for the remaining of that phase. An experiment starts with initializing the target device and application (line 1 - 2). Both the application and the device report run-time statistics regarding the user-specified objective and constraint.

1: Initialize device
2: Start application on device
3:
4: True Start with a new phase
5:while not  do
6:     if  then
7:         
8:         
9:          False
10:     end if
11:     
12:     
13:     if  then
14:          True
15:     end if
16:end while
Algorithm 1 Sampling-based online control loop for application executing on device , given input , application knob space , device knob space , objective metric , constraint metric and set point .

The main control loop consists of two components: 1) a sampler, 2) a phase detector. For each new phase, the sampler samples knob settings from the Cartesian product of the application and the device knob space. Each sampled knob setting is evaluated against the user-specified objective and constraint metrics. At the end of one sampling phase, the knob setting that maximizes the objective and meets the constraint is chosen and set for the following execution (line 7 - 8). The phase detector determines whether a new sampling phase is needed by checking the difference between the current run-time performance with the reference performance of the picked knob recorded during the sampling phase (line 12 - line 14). The design of the sampler and the phase detector are described in Section 4.3 and Section 4.5.

4.3. Sampler

The flow of one sampling phase with rounds is shown in Figure 4. In order not to add disturbance to application’s run-time behavior, the target application and the sampler are deployed separately onto a client (target device) and a server. The client and the server communicate by network connections.

A new sampling phase is activated when a new experiment begins or the phase detector identifies a new phase. Upon start, the client sets up a connection with the server by sending the objective (), the constraint () and the knob space () of the current run-time configuration to the server. The first round of the sampling phase starts by picking the first knob () on the server and sending it back to the client. Upon receiving , the client sets for the application and the device, and measures its run-time statistics during one measurement interval (3 seconds in our experiments). After measuring , its objective and constraint metric values and are sent back to the server. The streaming application keeps running with before it hears back from the server again. The server then calculates the second knob setting and repeats the above procedure. After the knob setting is measured, the knob setting that maximizes the objective and meets the constraint will be picked and set for the remaining execution until a new execution phase is detected.

[scale=0.95]

[->, line width=0.3mm] (1.5, -0.2) – (1.5, -1.0); at (0.58, -0.6) Executing;

[line width=0.5mm] (1.5, 0) circle (0.15); at (-0.25, 0.0) Set ;

[->, line width=0.3mm] (5.5, 1.0) – (5.5, -1.0); [->, line width=0.3mm] (1.5, 1.0) – (1.5, 0.2); at (0.4, 0.6) interval; at (6.25, 0.0) Waiting;

[line width=0.5mm] (1.5, 1.2) circle (0.15); [fill=lightyellow,line width=0.5mm] (5.5, 1.2) circle (0.15); [->, dotted, line width=0.3mm] (5.3, 1.2) – (1.7, 1.2); at (3.5, 1.4) ; at (0.8, 1.2) Set ;

[->, line width=0.3mm] (1.5, 1.8) – (1.5, 1.4); [->, line width=0.3mm] (5.5, 1.8) – (5.5, 1.4); at (6.25, 1.6) Pick ;

[fill=black] (3.5, 2.0) circle (0.05); [fill=black] (3.5, 2.2) circle (0.05); [fill=black] (3.5, 2.4) circle (0.05);

[->, line width=0.3mm] (1.5, 2.8) – (1.5, 2.4); [->, line width=0.3mm] (5.5, 2.8) – (5.5, 2.4); at (6.25, 2.6) Pick ;

[fill=lightblue, line width=0.5mm] (1.5, 3.0) circle (0.15); [line width=0.5mm] (5.5, 3.0) circle (0.15); [->, dashed, line width=0.3mm] (1.7, 3.0) – (5.3, 3.0); at (3.5, 3.2) ;

[->, line width=0.3mm] (1.5, 4.0) – (1.5, 3.2); [->, line width=0.3mm] (5.5, 4.0) – (5.5, 3.2); at (0.45, 3.6) interval;

[line width=0.5mm] (1.5, 4.2) circle (0.15); [fill=lightyellow, line width=0.5mm] (5.5, 4.2) circle (0.15); [->, dotted, line width=0.3mm] (5.3, 4.2) – (1.7, 4.2); at (3.5, 4.4) ; at (0.8, 4.2) Set ;

[->, line width=0.3mm] (1.5, 4.8) – (1.5, 4.4); [->, line width=0.3mm] (5.5, 4.8) – (5.5, 4.4); at (6.25, 4.6) Pick ;

[fill=lightblue, line width=0.5mm] (1.5, 5.0) circle (0.15); [line width=0.5mm] (5.5, 5.0) circle (0.15); [->, dashed, line width=0.3mm] (1.7, 5.0) – (5.3, 5.0); at (3.5, 5.2) ;

[->, line width=0.3mm] (1.5, 6.0) – (1.5, 5.2); [->, line width=0.3mm] (5.5, 6.0) – (5.5, 5.2); at (0.45, 5.6) interval;

[line width=0.5mm] (1.5, 6.2) circle (0.15); [fill=lightyellow, line width=0.5mm] (5.5, 6.2) circle (0.15); [->, dotted, line width=0.3mm] (5.3, 6.2) – (1.7, 6.2); at (3.5, 6.4) ; at (0.8, 6.2) Set ;

[->, line width=0.3mm] (1.5, 6.8) – (1.5, 6.4); [->, line width=0.3mm] (5.5, 6.8) – (5.5, 6.4); at (6.25, 6.6) Pick ;

[fill=bestblue, line width=0.5mm] (1.5, 7.0) circle (0.15); [line width=0.5mm] (5.5, 7.0) circle (0.15); [->, dashed, line width=0.3mm] (1.7, 7.0) – (5.3, 7.0); at (3.5, 7.25) {, , , }; at (3.5, 7.6) Set up connection; at (0.45, 7.0) new_phase?;

[->, line width=0.3mm] (1.5, 8.0) – (1.5, 7.2); [->, line width=0.3mm] (5.5, 8.0) – (5.5, 7.2); at (0.58, 7.6) Executing; at (6.25, 7.6) Waiting;

at (1.5, 8.4) Application; at (5.5, 8.4) Sampler;

[fill=lightblue] (0.5, 8.8) rectangle (2.5, 9.4); [fill=lightyellow] (4.5, 8.8) rectangle (6.5, 9.4);

at (1.5, 9.1) Client; at (5.5, 9.1) Server;

Figure 4. Sampling process

The strategy to sample the knob space is the key to find a competitive knob setting. More number of samples generally means better knob setting selection, but evaluating samples take time, and cause negative impact when evaluating sub-optimal samples. More samples also lead to less time for the chosen knob setting to be beneficial because applications have finite execution duration. In this paper, only limited number of samples is allowed to be drawn (). is chosen to be between 8 and 12 given application’s overall execution duration and knob space size.

In order to get a good grasp of the target knob space with small number of samples, we use global optimization techniques to design the sampling process. The key is to utilize the knowledge given by all previous samples, while balancing the dilemma of exploration and exploitation. Exploration generally means probing under-sampled portions of the knob space, for the purpose of getting out of local optimums and finding promising regions are yet to be sampled. Exploitation, on the other hand, searches a known promising region hoping to find a better sample than the samples already drawn from this region. By leveraging exploitation and exploration trade-off, the sampler can quickly identify unpromising regions and focus on promising regions only. Prior works usually attempt to build the performance model of the whole knob space, which is neither necessary nor cheap (hbm; caloree). Locating promising regions quickly is also beneficial to at least get a near optimal solution because there can be plenty of knob settings near the pareto frontier that yield similar performance (e.g. Figure 3).

Following the spirit of balancing exploration and exploitation, the whole sampling phase of Sonic is divided into two stages: 1) initialization; 2) searching.

4.3.1. Initialization

The initialization stage consists of the first of the total samples. In this stage, Sonic aims to learn the knob space at a coarse scale, so all sample choices follow the principle of exploration. Too few samples may cause exploitation in wrong regions while too many initial samples may leads to insufficient exploitation during the searching stage. Common choices of sampling strategy for the initialization stage includes random sampling and Latin hypercube sampling (LHS) (lhs). Taking a 2 knob space as example, each sample by LHS marks in which row and column the sample was taken and subsequent samples avoids those marked rows and columns. In contrast to naïve random sampling, LHS ensures that the set of picked knob settings is representative of the real variability.

4.3.2. Searching

The remaining samples all belongs to the searching stage. The goal of this stage is to find the knob setting that maximizes the objective while meeting the constraint through exploration and exploitation. The searching stage can be implemented in multiple ways, discussed in the following section. The arrangement of initialization and searching in a sampling phase is shown in Figure 5.

[scale=1.0] [-, line width=0.4mm] (0.0, 1.3) – (8.0, 1.3);

[-, line width=0.4mm] (0.0, 1.3) – (0.0, 1.45); [-, line width=0.4mm] (2.5, 1.3) – (2.5, 1.45); [-, line width=0.4mm] (8.0, 1.3) – (8.0, 1.45);

at (0.0, 1.0) ; at (2.5, 1.0) ; at (8.0, 1.0) ;

at (1.25, 1.6) Initialization; at (5.25, 1.6) Searching;

Figure 5. Sampling stages

4.4. Searching Strategies

4.4.1. Random sampling & LHS

Random sampling and LHS can also be used in the searching stage. However, neither of them utilizes the history of sampled knob settings since their samples can be generated given only random seed and knob space. These two sampling strategies are not helpful to find the optimal knob setting because no exploitation but only exploration is involved.

4.4.2. Regressors

The second searching strategy is built on machine learning regressors (models). Popular types of regressors includes linear regressors (linear_reg), ensemble regressors (random_forest; gradient_boost), and Gaussian process regressors (gp_reg), etc. In a sampling phase, a regressor is initialized on the data of all the sampled knob settings of the initialization stage and is updated after each newly sampled knob setting gets evaluated.

For each next knob setting in the searching stage, it is decided by first predicting every unsampled knob setting’s objective and constraint value with the most recently updated regressor, and then pick the one that maximizes the objective metric while staying under the constraint. After this sampled setting get evaluated, the regressor will get updated or rebuilt based on its type. Since the number of samples is small, rebuilding the regressor does not introduce noticeable overhead on a server. This process repeats for all samples.

In contrast to random sampling and LHS, regressors does utilize history information. However, this strategy only has exploitation because it takes the sample that maxes out the prediction at every round. When a new sample’s performance aligns well with the regressor’s prediction, this new sample won’t provide new information to update the regressor, causing the following sample to be very close to this one. This leads to difficulty in getting out of local minimums. Extra exploration is needed for better knob choices in the searching stage.

4.4.3. Bayesian optimization (with unknown constraint)

Another searching strategy that fits our control problem well is Bayesian optimization (bo; bo_tutorial; bo_review). Bayesian optimization is a sequential design strategy for derivative-free global optimization of black-box functions that are expensive or lengthy to evaluate (e.g.hyperparameter tuning for machine learning models (feurer2019auto; snoek2012practical; snoek2015scalable)).

Abstractly, Bayesian optimization aims to solve the problem of maximizing a target function (i.e. objective metric ) over a feasible set (i.e. knob space ). Bayesian optimization consists of two main components: a probabilistic surrogate model –most commonly, a Gaussian process regression (GP) –that models the objective function, and an acquisition function (e.g. Expected Improvement (bo_tutorial)) for deciding where to sample next.

After the initialization phase, Bayesian optimization construct a GP model based the initial samples. Given a point in , GP models

as a normal distribution with mean

and variance

. The more samples taken in one region, the smaller variance the points of this region tends to have. When determining the next sample, the acquisition function takes both the mean and the variance into consideration, in order to tradeoff exploration (large variance) and exploitation (large mean). The GP model is incrementally updated as new samples are evaluated.

Constraints can be incorporated into Bayesian optimization by scaling the acquisition function at each

with its probability of meeting the constraint 

(bo_unknown). For each constraint, a separate GP model is built. The estimated probability of one point to meet a certain constraint, , is done by evaluating the cumulative distribution function at on that GP model.

4.4.4. Hybrid approach

Though Bayesian optimization fits the control challenge very well, our experiments shows that it often requires a considerable number of samples to converge to a good solution. For example, Bayesian optimization tends to focus too much on exploration when taking 12 samples in a knob space that consists of more than 1200 knob settings.

In order to encourage exploitation in Bayesian optimization, Sonic leverages a hybrid strategy of combining regressors into Bayesian optimization to enforce exploitation during the searching stage. We choose GP regressor because Bayesian optimization’s underlying model is also GP.

[scale=1.0] [-, line width=0.4mm] (0.0, 1.3) – (8.0, 1.3);

[-, line width=0.4mm] (0.0, 1.3) – (0.0, 1.45); [-, line width=0.4mm] (2.0, 1.3) – (2.0, 1.45); [-, line width=0.4mm] (2.5, 1.3) – (2.5, 1.45); [-, line width=0.4mm] (7.5, 1.3) – (7.5, 1.45); [-, line width=0.4mm] (8.0, 1.3) – (8.0, 1.45);

at (0.0, 1.0) ; at (2.0, 1.0) ; at (8.0, 1.0) ;

at (1, 1.6) LHS; at (2.25, 1.6) GP; at (5, 1.6) Bayesian optimization; at (7.75, 1.6) GP;

Figure 6. Hybrid approach

The design of the hybrid approach is shown in Figure 6. The sampler starts with Latin hypercube sampling (LHS) to perform the initial exploration of the knob space. At the beginning of the searching stage, a GP regressor is built on the initial samples and one next sample is generated using this GP regressor. This sample can potentially boost the sampling quality of Bayesian optimization because it provides an “okay” solution so that unpromising regions are much easier to identify, leading to less exploration needed. For the last sample, Bayesian optimization may still bias to exploration based on its acquisition function but exploration is not beneficial at the last step. In Sonic, we enforce exploitation by building a GP regressor based on all the previous samples to generate the final knob setting sample.

4.4.5. Discussion

The primary reason to choose Bayesian optimization over other global optimization techniques is that it construct a surrogate model incrementally with very limited number of samples. Instead of assuming strong structural assumptions of being linear and convex, Bayesian optimization assumes that closer knob settings should have more similar performance by defining a covariance function (e.g. RBF kernel111https://en.wikipedia.org/wiki/Radial_basis_function, Matérn kernel222https://en.wikipedia.org/wiki/Matérn_covariance_function). Global optimization techniques such as simulated annealing (simulated_annealing)

, evolutionary algorithm 

(evolutionary) assume no functional structure at all. However, those techniques require much more samples to converge to a good solution, which is not feasible for the control problem in this paper.

In the domain of reinforcement learning, upper confidence bound (UCB) and Thompson sampling 

(thompson)

are two popular heuristics to deal with the “exploration Vs. exploitation” dilemma for problems like the multi-armed bandit

333https://en.wikipedia.org/wiki/Multi-armed_bandit. However, those techniques usually require multiple visits to every knob setting in order to be effective, which far exceeds the number of samples allowed in this paper.

4.5. Phase Detection

After the sampling phase, the best knob setting among all the sampled ones according to the user-specified objective and constraint is set for the application and the device. In order to detect new phase, the client keeps reporting run-time statistics to the server. For each measuring interval, the phase detector compares the received run-time performance with the recorded performance of the chosen knob setting during the sampling phase. If the different is larger than 10% and lasts for two consecutive intervals, a new sampling phase will be activated to search for a new solution.

4.6. Implementation Details

Standalone implementation

Sonic is designed and implemented independent of any application, device, input, objective and constraints. Other than reporting performance at run time, no change is needed for the target application and device.

Avoid duplicated samples

While it is rare to see duplicated samples in continuous knob space, it’s common in discrete knob space because knob values are rounded to the nearest discrete value. Duplicated samples waste precious attempts of drawing samples. A nearby knob setting that has the highest acquisition value will be selected when duplication happens.

Avoid system instability

In order to avoid the instability caused by turning on/off cores, we change application’s core allocation only by setting its thread affinity. Since changing thread affinity has delays and introduce extra noise into measurements, knob settings are ordered so that the total distance between successive knob settings are minimized (i.e. gray code encoding) during the initialization stage. Furthermore, we also make the default knob setting be the first sample point. In this way, it is possible to tackle minimization problems with requirements represented by ratios.

5. Experimental Evaluation

5.1. Experiment Setup

5.1.1. Benchmarks and Inputs.

In this work, Sonic is evaluated on two benchmark suites. The first one is the PARSEC benchmark suite (parsec)

, an open-source parallel benchmark suite for emerging applications for evaluating parallel systems. Benchmarks in PARSEC cover popular application domains including financial, computer vision, physical modeling, future media,

etc. All the applications are implemented in C/C++. Inputs for different applications are included respectively in this benchmark suite. Here is the list of parallel applications that are regarded as streaming applications:

  • Bodytrack. Bodytrack is a computer vision application that tracks a markerless human body. It helps machines interact with environment better with no aid provided. The inputs to Bodytrack are four sequences of frames fed from 4 cameras.

  • FaceSim. FaceSim is a computer graphic application that simulates motions of a human face for visualization purposes. It is used in video games and other interactive animations require visualization of realistic faces in real time. FaceSim takes a series of muscle activation as inputs and simulates on a face model.

  • FluidAnimate. FluidAnimate is a computer animation application that simulates the underlying physics of fluid motion for real-time animation purposes with smoothed particle hydrodynamics algorithm(sph). It is a highly demanded feature that allows significantly more realistic animations in games and other media productions. The inputs to FluidAnimate is a list of particles.

  • StreamCluster.

    StreamCluster is a data mining application that computes an approximation for the optimal clustering of a stream of data points. It is a common problem in many fields like network security or pattern recognition. StreamCluster takes a stream of multidimensional points as inputs.

  • Vips. Vips is a image processing that applies a series of transformations to an uncompressed image. Vips primarily targets huge professional images that needs to be handled quickly.

  • X264. X264 is a video encoder that encodes a sequence of images into the H.264/MPEG-4 AVC compression format. It is widely used nowadays because encoded videos are more network bandwidth and storage friendly.

The second set of benchmarks evaluated is a modified version of MLPerf inference benchmark suite (mlperf) that covers the deep learning aspect of streaming applications. In order to fit and execute on resource-constrained devices, some of the applications are replaced with simpler versions and smaller inputs. The selected applications include ResNet8 & ResNet50 (resnet) (image classification), MobileNet V2 (mobilenetv2) (object detection), visual wake words (MobileNet V1 (mobilenet)), speech recognition

(convolutional neural network) and

text classification

(recurrent neural network). All the applications are implemented and inferenced using Tensorflow 

(tensorflow).

Due to the limitation of tuning application knobs due to their availability (Section 5.6), application knobs are not controlled by default otherwise explicitly mentioned.

5.1.2. Device and knob space.

We evaluate Sonic extensively on multiple platforms with different architectures.

  • Odroid XU4 is a 32-bit embedded platform with heterogeneous big.LITTLE core architecture. Odroid XU4 is equipped with four Cortex-A15 cores and four Cortex-A7 cores with frequency ranging from 0.2 to 2 GHz and 0.2 to 1.5 GHz (0.1GHz steps) respectively. The power dissipation of the whole board is measured by an external SmartPower2 device. The total knob space consists of 6384 knob settings (4 knobs).

  • Jetson TX2 is a 64-bit embedded AI computing device that has four Cortex-A57 cores and two Denver2 cores that both operate from 0.345 to 2 GHz (0.15GHz steps). The power dissipation of different components of these board is measured through its internal i2C interface. Jetson’s knob space has 1694 knob settings (4 knobs).

  • Intel Xeon Gold is a modern desktop system with two sockets and 32 physical cores (64 thread cores). The power dissipation of this platform is measured by pyRAPL, a software toolkit that uses the "Running Average Power Limit" technology to measure the energy footprint of a host machine along application’s execution. Only one knob with 64 settings are tuned on this platform.

5.1.3. Metric.

The goal of an online controller is to find the best knob setting possible to optimize a user-specified objective while staying under constraints if specified. Due to the randomness and uncertainty involved in the sampling process and measurement, we evaluate the effectiveness of an online controller by calculating the expectation of the objective when the constraint is met across independent runs.

(Quality of service) is defined as the ratio of the expected objective between using an online controller and the optimal knob setting. expressions for maximization and minimization problems are given in Equation 1 and 2 respectively.

(1)
(2)

5.1.4. Experiment Design.

We compare Sonic with six other control settings. The implementation of SGD regressor and random forest regressor are from the scikit-learn package

444https://scikit-learn.org/. Bayesian optimization’s implementation comes from package Emukit555https://github.com/EmuKit/emukit(emukit).

  • Race-to-Idle (DEFAULT)

  • Random sampling

  • Linear regressor: SGD

  • Ensemble regressor: random forest

  • Bayesian optimization

  • Sonic: Bayesian optimization + GP regressor

  • Optimal knob setting (ORACLE)

The length of one sampling phase is set to be 12 rounds on Odroid XU4, 10 on Jetson TX2 and 8 on the desktop system according to their knob space size. Discussions on the impact of the number of samples are discussed in Section 5.8. During the sampling phase, due to the fact that command taskset needs time (0.51s) to stabilize, we set the measurement interval to 3 seconds, in order to balance the length of one sampling interval and the measurement noise. For fair comparison, the length of the sampling phase is normalized to be 10% of the total execution time. is averaged over 40 independent runs for each run-time configuration.

5.2. QoS on Odroid XU4

(a)
(b) Power consumption
Figure 7. Application’s objective and constraint performance on Odroid XU4.

In this section, we show how Sonic is able to solve a commonly seen constrained optimization problem: “Best performance under a power cap of 7 W” on Odroid XU4 for all the applications with their provided inputs in the benchmark suite.

5.2.1. Optimal knob settings.

Given this constrained optimization problem, online control would be unnecessary if all applications’ optimal knob settings are the same on one device. In Table 2, we display the optimal knob setting for each application. For a total of 12 applications, almost every application has a unique optimal knob setting on Odroid XU4, showing that learning the device alone is not sufficient in solving such constrained optimization problems.

Optimal knob setting (ORACLE)
Platform Odroid XU4 Jetson TX2
Objective Perf. Power
Constraint Power 7W Perf. > 60% Default
Bodytrack (4, 0, 1500, 200) (4, 2, 1263, 1263)
FaceSim (4, 2, 1400, 1500) (4, 2, 961, 1263)
FluidAnimate (4, 4, 1400, 1500) (4, 2, 1263, 1263)
StreamCluster (4, 4, 1200, 1500) (4, 2, 961, 1152)
Vips (2, 4, 1600, 900) (2, 0, 1263, 345)
X264 (3, 4, 1100, 1500) (3, 2, 1263, 1416)
ResNet8 (4, 4, 1400, 1500) (3, 2, 1416, 1263)
ResNet50 (4, 4, 1800, 1500) (3, 2, 1263, 1263)
MobileNet V2 (4, 2, 1800, 1500) (3, 2, 1263, 1263)
Visual wake words (4, 2, 1900, 1200) (3, 0, 1722, 345)
Speech recognition (4, 2, 1300, 1500) (2, 2, 1263, 1263)
Text classification (3, 0, 1800, 200) (4, 0, 1722, 345)
Table 2. The optimal knob setting of each application on two platforms given different objective and constraint. The first two numbers in the tuple represents core numbers, and the last two represents core’s frequency.

5.2.2. Controller Comparison.

The and power dissipation of different controllers are shown in Figure (a)a and (b)b. For DEFAULT, all the applications have better performance than ORACLE’s. However, the power dissipation of DEFAULT violate the 7 Watt constraint for every application. Note that the power dissipation of DEFAULT is different among applications, indicating different optimal knob settings given a fixed power constraint.

When applying sampling-based online controllers, all are able to provide knob settings that meet the 7-Watt constraint. The reason is that only knob settings that meet constraint are considered at the end of each sampling phase. As expected, random sampling has the worst performance because it does not learn from previous samples. Regressor-based approach performs much better than random sampling due to their aggressive exploitation. SGD regressor does explore regions of interest effectively for many applications, but not accurate enough due to its underline linear model because the relationship between performance and knob settings may not be linear. Random forest regressor is an ensemble regressor that fits a number of decision trees on various subset of the samples and uses voting to improve the predictive accuracy and control over-fitting. Since no assumption on linearity or convexity, it performs better than the SGD regressor. In terms of the Bayesian optimization controller, it does trade off exploitation and exploration during the searching stage, but the knob space is relatively large comparing to the number of samples, it fails provide enough exploitation to refine the solutions from exploration.

Finally, after manually adding exploitation at the beginning and the end of searching stage, Sonic is able to provide near-optimal results with only 12 samples in a knob space of 6384. It outperforms regressor-based approaches because they lack exploration in the searching stage. Comparing with ORACLE, using Sonic for streaming applications only incurs 4.8% loss. Sonic achieves this without any prior knowledge of the run-time configuration. This 4.8% loss is accounted by two parts: 1) knob settings evaluated during the sampling phase are suboptimal; 2) the knob setting given by the sampler may not be the global optimal.

5.2.3. Individual Experiments.

In Figure 8, we show the objective and constraint performance of each individual run for all the applications. The average objective and constraint performance of DEFAULT and ORACLE is represented by blue and green stars. Blues dots and red dots represent independent experiments with Sonic and random sampling respectively. Using Sonic significantly improves the objective while still staying under the constraint. In addition, Sonic helps reduce the performance variance of individual runs. Note that uncertainty is involved throughout each experiment due to randomness and noise. Though Sonic can generally boost performance, it is still possible to yield performance that is worse than other baseline sampling methods in a single run.

In this experiment, we also observe that Sonic is not able to find the exact global optimal knob settings all the time given large knob space, limited number of samples and uncertainty. However, Sonic can still reach near-optimal because there exist plenty of “good enough” knob settings near the pareto frontier that have similar performance (e.g. Figure 3). In other words, Sonic is an approximate run-time solution to the optimal knob setting given any run-time configuration.

Figure 8. Performance distributions of 40 independent experiments.

5.3. QoS on Jetson TX2

In order to evaluate the ability to solve various constrained optimization problems, we experiment with a different problem on Jetson TX2: “Least energy consumption over a performance requirement of 60% DEFAULT’s performance”.

Due to lack of space, we cannot include figures in this submission. For this problem, DEFAULT can easily meet the performance requirement but its (minimization problem) of energy is only 0.73. The for random sampling, SGD regressor, random forest regressor and Bayesian optimization are 0.81, 0.89, 0.91 and 0.86 respectively. Finally, Sonic’s of energy is 0.94. The corresponding optimal knob settings of this problem are included in Table 2.

5.4. QoS on Intel Xeon Gold

In Section 2.1, we motivate that DEFAULT, which occupies all the cores automatically, are sub-optimal in terms of performance on a desktop platform for deep learning frameworks. DEFAULT also results in unnecessary compute resource waste.

After applying Sonic to those applications, a geometric mean of 1.32 speed up is achieved “for free” because no extra optimization or profiling is applied to those models. What changed is just the run-time core number selection. In addition, Sonic helps reduce resource utilization on this desktop system by 78% on average, providing chances for other applications to run on the same device. When comparing to ORACLE, the average loss is 5.5%.

Applications DEFAULT Sonic Speedup
ResNet8 1409.01 1686.35 1.20x 0.95
ResNet50 53.46 58.64 1.10x 0.96
MobileNet_V2 124.57 134.46 1.08x 0.97
Visual wake words 245.11 255.26 1.04x 0.96
Speech recognition 2.06 3.73 1.81x 0.87
Text classification 124.92 246.95 1.98x 0.96
Avg. Loss - - 1.32x 0.94
Table 3. Model inference FPS comparison between the DEFAULT and Sonic on a 2-socket 64-core machine.

5.5. Phase Detection

Some streaming applications may have different phases in one run. It can be another streaming application entering the system, input change and so on. For experimental purpose, we concatenate the “Big Buck Bunny” and the “Ducks take off” video mentioned in Section 2.3. They have different performance patterns due to their content difference. The run-time configuration can be described as “Given certain input, run X264 on Odroid XU4 and find the knob setting that optimizes power dissipation while keeping FPS over 2”.

The run-time FPS and power data in chronological order is shown in Figure 9. Sonic starts with an sampling phase and one knob setting is picked after 10 samples evaluated. However, at round 30, input video changes to “Ducks take off”. It contains photographic content so it needs more computation to encode. The FPS suddenly drops under 2 and the phase detector activates a new sampling phase after two measuring intervals. In the new phase, a knob setting with much higher computation capacity and power is picked in order to meet the constraint. This shows that X264 is able to minimize its power dissipation while meeting the constraint despite the phase change occurs during the execution when using Sonic.

(a) FPS
(b) Power dissipation
Figure 9. Executing X264 with input that contains phases on Odroid XU4 given a constrained optimization problem: least power over a FPS requirement of 2.

5.6. Application & Device Knob Space

Application knobs are typically programmed to be set before an application starts instead of changing adaptively. Though tuning application knobs are shown to be beneficial in many prior works (green; rumba; slambooster; capri), additional code change are often required if application knobs need optimization in Sonic.

In this section, we test the impact of tuning application and device knobs together for application text classification. We observe that batch size has interesting impact on this application’s performance. Its default batch size is 128 and the optimal knob setting on the desktop system is 7 cores. However, if the batch size is set to 64 (256), the optimal performance is increased by 11% (7%) and only uses 3 (6) cores.

After incorporating batch size () as an additional knob to the number of cores, the performance of text classification is furthur improved by 8% (246.95 265.2), showing Sonic can tune both application and device knobs.

5.7. Reuse previous samples

It is common to run the same run-time configuration for multiple time in practice (e.g. debugging). In this paper, we also evaluate Sonic’s effectiveness when previous experiments of the same run-time configuration are available. In this case, sampling history of previous experiments are utilized to construct a more accurate surrogate model for Bayesian optimization and Gaussian process regressor. For the run-time configuration discussed in Section 5.2, the average loss drops from 4.8% to 3.6% when one previous experiment is available. When more than three previous runs are available, the average loss drops below 3%.

5.8. Discussion

Controller Overhead.

Since the sample size is small in our experiments, model building or update only takes 0.2s on the server. As application’s execution does not stall during model building or update, this duration has no impact on system’s performance.

Number of Samples.

In this paper, we normalize application’s total execution time to be 10x of a sampling phase. In general, if the amount of workload is fixed, the more samples, the better knob setting choice after a sampling phase, but there are two concerns: 1) increased overhead by sampling and evaluating more sub-optimal knobs; 2) less time for the picked knob to benefit the whole execution. In this paper, we try to make it as small as possible for it to be more practical while yielding reasonable results.

Granularity of Sampling Interval.

Given a fixed sampling phase duration, the shorter the sampling interval is, the more samples can be evaluated but more noise is involved in each sample as a cost. On the other hand, less samples can be taken when using a longer interval, leading to less chance in finding a good solution.

Non-streaming applications.

Sonic may not be useful for non-streaming applications. If the target application’s duration is short, there won’t be enough time for the sampling strategy to be beneficial. If the application is long-running but contains multiple small phases, sampling can be misleading.

6. Related Work

Profiling-based approaches

Many systems have been proposed for solving constrained optimization problems such as trading off program accuracy and performance  (green; gmm; neuralvector; jouleguard; poet; rumba; caloree; hbm; thunderx_beacon; mimo; spectr; capri). Most of them rely on profiling data to construct models for control. For example, Rumba (rumba) applies lightweight error checks during application’s execution to detect large approximation errors based on an offline error predictor, and then fixes these errors by re-computation. Green (green) periodically execute the exact version of the target program in parallel with a pre-defined approximate version and tunes knobs based on their difference. NeuralVector (neuralvector) trains embeddings for loop code snippets and tries to find the best parameters for openmp pragmas before execution. POET (poet) collects power and performance data exhaustively offline and use them during online control with a PID-like controller. MIMO (mimo) trains offline multi-input multi-output models and uses them in a traditional control loop. Profiling-based approaches are infeasible to adapt to diverse run-time configuration space.

Online approaches

Online approaches only utilize the information of the target application at run time to tune knob settings (siblingrivalry; fang2014performance; li2006dynamic; slambooster; slambooster2; flicker; ponomarev2001reducing; holistic). For example, SiblingRivalry (siblingrivalry) and Flicker (flicker) use evolutionary algorithms to find the optimal hardware allocation for applications. Work (fang2014performance) utilizes simulated annealing to tune knobs. However, evolutionary algorithm and simulated annealing take much more samples than Bayesian optimization does to converge to a near-optimal result. SLAMbooster (slambooster) applies PID controller to hot domains such as Simultaneous Localization and Mapping. However, such approach requires insights into application’s knob space ahead of time.

Sampling-based control

Sampling-based control are primarily used in certain domains such as robotic motion planning (karaman2011sampling; kingston2018sampling; liu2010sampling)

, where robots with many degrees of freedom try to find feasible motion sequences autonomously under constraints for operations in realistic tasks such as spacecraft logistics, health care,

etc.

Bayesian optimization use cases

Bayesian optimization gets increasingly popular in solving sampling-constrained problems such as hyperparameter tuning for machine (deep) learning models (feurer2019auto; snoek2012practical; snoek2015scalable), neural architecture search (elsken2019neural; kandasamy2018neural), robotic control (antonova2017deep; calandra2016bayesian), chemical material design (griffiths2017constrained; li2017rapid), etc. Bayesian optimization also gets more attention in designing specific control problems in recent years, including finding the best controller to certain industrial process  (neumann2019data); tuning PID controllers for HVAC systems (fiducioso2019safe); modeling aircraft maneuver control system (kim2019black), etc.

7. Conclusion

In the paper, we propose an online controller to optimize streaming applications entirely at run time. At the beginning of each phase of an experiment, Sonic samples and evaluates knob settings sequentially to determine a good one for the remaining execution according to a user-defined constrained optimization problem. Sonic applies a hybrid approach of machine learning regressors and Bayesian optimization to improve the choices of samples. We demonstrate Sonic’s effectiveness by evaluating multiple applications across multiple devices, and showing Sonic is able to find near-optimal knob settings at run time.

References