I Introduction
The usage of mobile platforms and associated applications is growing rapidly [21]
. To meet the needs of emerging applications such as games and artificial intelligence, mobile systemonchips (SoCs) are growing in heterogeneity. Emerging heterogeneous mobile SoCs support cores of different types (e.g., Big and Little), and dynamic resource management (DRM) decisions correspond to selecting the number of active cores and corresponding frequency level for each type of core. The DRM problem at runtime as a function of active applications to meet the desired tradeoffs among relevant objectives (e.g., performance and energy) poses two key challenges. First, the space of DRM decisions is exponential in the number of cores and their frequencies. For example, Samsung Exynos 5422 SoC has four Big and four Little cores, and we need to select the best decision from 4940 candidates at every decision epoch, which is typically between 50 to 100 ms
[16]. Second, optimal DRM decisions change depending on the desired tradeoff among target objectives. Since required tradeoffs can change based on realworld scenarios, we need to create Paretofrontier DRM policies to make optimal DRM decisions for different tradeoffs. The Paretofrontier allows the selection of a DRM policy that meets the desired tradeoff at runtime.Prior work on DRM to address the above two challenges is lacking in the following ways. First, solutions in commercial mobile SoCs are based on simple heuristics
. For example, interactive and ondemand governors increase/decrease the operating frequency by one level when the utilization falls below a static threshold. These heuristics only provide a
singletradeoff for performance and energy, and they are less accurate than datadriven machine learning based approaches
[12]. Second, commonly used reinforcement learning (RL) [2, 12] approaches define a reward function for each objective and consider a linear combination of reward functions with one scalar parameter for each objective. RL methods learn the DRM policy via trialanderror based on the feedback from DRM decisions. To create Paretofrontier DRM policies, we need to run RL methods with different scalar parameters repeatedly. Unfortunately, accuracy of RL algorithms critically depends on the design of good reward functions, which is hard and even impossible to do for some objectives, e.g., performance per Watt (PPW). Since RL needs to try different actions at each state (i.e., exploration) to discover the best DRM policy, it may not be feasible for large state space and DRM decision space, as in our case. We may need to run RL with a large number of finegrained scalar parameter configurations to uncover highquality Paretofrontier DRM policies. Third, recent work on supervised approaches to DRM policies is based on the imitation learning (IL) framework [12]. The key idea in IL is to create an Oracle policy for each targeted tradeoff and mimic its behavior using offtheshelf supervised learning algorithms. Recent work has shown the effectiveness of IL for some specific design objectives with minimal tradeoff space. Unfortunately, it is computationally hard to create highquality Oracle policies for complex objectives, such as PPW, for approximating the optimal Pareto front.
This paper proposes a novel framework referred to as Learning Paretofrontier Resource Management Policies via InformationTheoretic Search (PaRMIS) to automatically create highquality Paretofrontier DRM policies for any given set of design objectives, as shown in Figure 1
. PaRMIS specifies DRM policy as a function, e.g., multilayer perceptron (MLP), with a fixed number of parameters over the system state. The key idea is to build statistical models over this parameter space by evaluating candidate DRM policies in terms of the given design objectives and using them to select the candidate DRM policy that maximizes the information gain of the optimal Pareto front in each iteration. We derive an efficient algorithm to compute entropy, a key computational step in the selection procedure. A key feature of our framework is that designers can plugandplay with any set of target objectives and uncover optimized Paretofrontier DRM policies in a small number of iterations. Our experimental evaluation on a commercial heterogeneous SoC with 12 applications shows the efficacy and generality of PaRMIS over the stateoftheart, including interactive and ondemand governors and RL and ILbased methods.
Contributions. The key contribution is the design, demonstration, and evaluation of the PaRMIS framework to create Paretofrontier DRM policies for heterogeneous SoCs. To the best of our knowledge, this is the first general framework that directly optimizes for Paretofrontier DRM policies. Specific contributions include:

Developing a novel informationtheoretic framework referred to as PaRMIS to create resource management policies to tradeoff target design objectives such as performance and energy. PaRMIS iteratively selects a candidate policy for evaluation that maximizes the information gain about the optimal Pareto front.

Development of an efficient algorithm to compute entropy, a key step in the PaRMIS framework.

Comprehensive experiments on a commercial hardware platform using realworld applications to show the advantages of PaRMIS in terms of the quality of Pareto front and ability to optimize complex objectives over stateoftheart methods.
Ii Background and Problem Setup
We consider a heterogeneous mobile platform with different types of cores and there are number of cores for type. Suppose we can make resource management decisions at runtime
to control the number of active cores and frequencies for each core type. Let each resource management decision be represented by two vectors
and , where and represent the number of active cores and frequency for type, respectively. For example, the heterogeneous SoC employed in our experiments has two types of cores: Big = 4) and Little = 4). Hence, each resource management decision is a fourtuple . Suppose a policy maps the current system state captured in terms of hardware counters (see Table I) to a candidate resource management decision tuple. The hardware counters are obtained for a set of repeatable decision epochs for each application. Epochs are clusters of macroblocks obtained by profiling the basic blocks in an application, as detained in [6, 12]. In this work, we consider policies represented as functions with parameters (e.g., MLP) and consequently denote them as .Given design objectives (e.g., performance, energy) and a set of applications App, our goal is to create Paretofrontier resource management policies to tradeoff the given objectives for the application set App. The evaluation of the quality of decisions from a candidate policy produces a vector of objective values = . We say that a policy Paretodominates another policy if and there exists some such that . The optimal solution of our problem is a set of policies such that no other policy Paretodominates a policy . The solution set is called the optimal Paretofrontier resource management policies and the corresponding set of objective values is called the optimal Pareto front. Once we have a set of Paretofrontier DRM policies, we select an appropriate policy at runtime based on the desired tradeoff among the design objectives.
Iii Related Work
Heterogeneous SoCs are widely used due to their integration of multiple types of cores (Big/Little), graphics processing units, and other accelerators to support millions of applications [11]. The heterogeneity in the processors necessitates DRM techniques that are able to choose the best configurations as a function of the application requirements [15, 9]. Most DRM techniques, including default governors such as ondemand [16], use core utilization to make their decisions. However, utilization alone does not provide sufficient information about the characteristics of applications running on the system. To address this drawback, recent approaches have used performance counters to make DRM decisions [1, 17, 19, 20]
. The performance counters give finegrained information about the system state, thus allowing DRM policies to make more intelligent decisions. Machine learning approaches, such as decision trees
[17], RL [2], and IL [10, 12, 20] have also been used to create DRM policies for mobile platforms. While these approaches are able to improve upon prior DRM methods, they still optimize for a single objective function, such as energy or execution time or PPW. However, in realworld scenarios, we need DRM policies that can achieve the user’s desired tradeoff among multiple objectives of interest. Therefore, there is strong need to develop algorithms to create Paretofrontier DRM policies so that the system can use the appropriate DRM policy at runtime based on the user’s desired tradeoff.Instructions Retired  Noncache External Memory Request 
CPU Cycles  Sum of Little Cluster Utilization 
Branch Miss Predictions  Per Core Big Cluster Utilization 
Level 2 Cache Misses  Total Chip Power Consumption 
Data Memory Accesses 
The DyPO approach proposed in [6] performs exhaustive search
to find Paretofrontier points for the objectives of interest and then designs a logistic regression classifier at a
coarselevel over clusters of the Pareto points. Unfortunately, exhaustive search does not scale well with the size of the DRM decision space and number of applications; and the coarse approximation is significantly suboptimal. Recent work [10, 12] used RL and IL to overcome the drawbacks of DyPO by creating DRM policies for a limited number of tradeoff scenarios by optimizing a linear combination of the desired objectives. However, RL and IL approaches suffer from two drawbacks. First, they cannot be extended to complex objectives (e.g., PPW) and/or different tradeoffs due to the difficulty in designing reward functions and Oracle policies to provide supervision. Second, they don’t optimize for Paretofrontier DRM policies directly and can require significant tuning of scalarization parameters and other hyperparameters. Third, linear scalarization is known to perform poorly due to its inability to explore nonconvex regions of the Pareto front [4]. In strong contrast to these approaches, the proposed PaRMIS framework can be used to obtain (near) optimal Paretofrontier DRM policies for any given set of design objectives. Experiments on the OdroidXU3 [8] board show that PaRMIS achieves Paretofronts that have 13% and 23% higher Pareto hypervolume metric compared to stateoftheart RL and IL methods, respectively.Iv PaRMIS Framework
Overview of PaRMIS. To find optimized Paretofrontier policies , PaRMIS learns statistical models for design objectives over the parameter space using training data in the form of candidate DRM policy evaluations and iteratively selects the next DRM policy for evaluation. We perform the following algorithmic steps in each iteration: 1) Using the current statistical models, we select the parameters of the candidate DRM policy that maximizes the information gain about the optimal Pareto front . 2) We evaluate the DRM policy by executing it on the target platform while running the applications to measure the tuple of objective evaluations =. 3) We use the new training example in the form of (input) policy parameters and (output) objective evaluations = to update the statistical models. At convergence or after maximum number of iterations, we compute the Pareto front from the aggregate set of objective evaluation vectors and output the DRM policies corresponding to Pareto front as the resulting solution. Algorithm 1 provides the pseudocode and Figure 1 shows an example illustration of PaRMIS for two design objectives.
Iva Learning Statistical Models from Training Data
Training data. We collect training data by iteratively evaluating a sequence of DRM policies. Each training example is of the following form: a) input variables are parameters of the DRM policy ; and b) output variables are objective evaluation vectors = obtained by executing the DRM policy when running applications App on the target heterogeneous SoC Arch. Therefore, aggregate training data after iterations consists of training examples of inputoutput pairs.
Statistical models. We want to learn statistical models from training data to capture our uncertainty about the Pareto front and guide us in selecting the candidate DRM policy for evaluation to quickly uncover the Paretofrontier DRM policies. We employ Gaussian processes (GPs) [23] as our choice of statistical model due to its superior uncertainty quantification ability via Bayesian interpretation [23]. A GP over input space is a random process from to . It is characterized by a mean function and a covariance or kernel function
. The posterior mean and standard deviation of a GP provide the prediction and uncertainty, respectively. Intuitively, uncertainty will be low for DRM policy parameters
that are close to the ones in our training data and will increase as the distance grows. We model the objective functions using independent GP models with zero mean and i.i.d. observation noise and update all statistical models from the aggregate training data after every iteration.IvB Selecting DRM Policy for Evaluation via Information Gain
The effectiveness of PaRMIS framework critically depends on the reasoning mechanism to select the candidate DRM policy for evaluation in each iteration. Ideally, we want an algorithmic approach that can use the uncertainty of learned statistical models and allows us to uncover highquality Paretofrontier DRM policies in a small number of iterations. Therefore, we propose a novel informationtheoretic approach that selects the next candidate DRP policy (for ease of notation, we only use parameters in the below discussion) that maximizes the information gain about the optimal Pareto front . This is equivalent to expected reduction in entropy over the optimal Pareto front . Our utility function that maximizes the information gain between the next candidate input for evaluation and Pareto front is given as:
(1)  
(2)  
(3) 
Information gain is defined as the expected reduction in entropy of the posterior distribution over the optimal Pareto front as given in Equations 2 and 3 resulting from the symmetric property of information gain.
The first term in the r.h.s of Equation 3, i.e., the entropy of a factorizable
dimensional Gaussian distribution
)) can be computed in closed form as shown below:(4) 
where
is the predictive variance of
GP model at input . Intuitively, it says that the entropy is distributed over the GP models by the sum of their log standarddeviations. The second term in the r.h.s of equation 3 is an expectation over the Pareto front . We can approximately compute this term via MonteCarlo sampling as:(5) 
where is the number of samples and denote a sample Pareto front. The main advantages of our utility function are its computational efficiency and accuracy. There are two key algorithmic steps to compute Equation 5, which we describe below:
1) Computing Pareto front samples . To compute a Pareto front sample , we first sample functions from the posterior GP models via random Fourier features [18]. Subsequently, we solve a multiobjective optimization over the sampled functions to capture the interactions between different objectives. We employ the popular NSGAII algorithm [5] to solve the multiobjective optimization problem with sampled functions noting that any other algorithm can be used to similar effect.
2) Computing entropy with respect to Pareto front sample . Let be the sample Pareto front, where is the size of the Pareto front and each is a vector evaluated at the sampled functions. The following inequality holds for each component of the vector in the entropy term :
(6) 
The inequality essentially says that the component of (i.e., ) is upperbounded by a value obtained by taking the maximum of components of all vectors in the sample Pareto front . This inequality can be proven by a contradiction argument. Suppose there exists some component of such that . However, by definition, is a nondominated point because no point dominates it in the th dimension. This results in which is a contradiction. Hence, our hypothesis that is incorrect and inequality 6 holds.
By combining the inequality 6 and the fact that each function is modeled as a GP, we can model each component as a truncated Gaussian distribution since the distribution of needs to satisfy . Furthermore, a common property of entropy measure allows us to decompose the entropy of a set of independent variables into a sum over entropies of individual variables [3]:
(7) 
Equation 7 and the fact that the entropy of a truncated Gaussian distribution [14] can be computed in closed form gives the following mathematical expression for the entropy term .
(8)  
where , , and and
are the p.d.f and c.d.f of a standard normal distribution, respectively. By combining equations
4 and 8 with Equation 3, we get the final form of our utility function as shown below:(9) 
V Experiments and Results
Va Experimental Setup
Heterogeneous Mobile SoC platform. We employ the OdroidXU3 board [8] running Ubuntu 15.04 for our experiments. The Exynos 5422 SoC integrates four A15 big cores, four A7 Little cores, a Mali T628 graphics processing unit (GPU), and other system components. The Odroid board also provides current sensors to measure the power consumption of the big CPU cluster, Little CPU cluster, main memory, and the GPU. We use the onboard current sensors to obtain the energy consumption and PPW metrics to evaluate different DRM policies considered in this paper.
Benchmarks. We employ 12 benchmarks from MiBench [7] and CortexSuite [22] suites using the “large” input datasets for each suite. These benchmarks represent a wide range of realworld scenarios encountered by heterogeneous SoCs.
Design objectives. We consider three objectives, namely, execution time, energy, and PPW to test the generality and effectiveness of different DRM algorithms.
Decision space for DRM policies. For the OdroidXU3 platform, the decision space is defined by the number of active Big/Little cores and their respective frequencies. There are 45 combinations for active cores given that one Little core has to be ON at all times to manage the operating system. Similarly, the Big and Little core clusters support frequencies from 200 MHz to 2 GHz and 200 MHz to 1.4 GHz in 100 MHz steps, respectively. Consequently, there are 451319 (4940) candidate DRM decisions at each system state. The DRM policy must choose one of these 4940 configurations at each state depending on the desired tradeoff among target objectives.
Decision interval. An application goes through multiple phases throughout its execution. As a result, using the same configuration for the entire application is not optimal. To this end, the policies proposed in this paper use the repeatable decision epochs described in [6] for making decisions. Each decision epoch consists of a cluster macroblocks that capture the varying characteristics of the application. The policies use the hardware counters (Table I) observed in each epoch to decide the configuration for the following epoch.
Policy representation.
For all learningbased approaches, namely, PaRMIS, RL, and IL, we use one function to make DRM decision for each of the four control knobs at each decision epoch. In our implementation, we use the following MLP configuration to represent each of the four functions noting that any other function can be used to similar effect: two hidden layers with the ReLU activation and an output layer with the softmax activation. The number of output layer neurons is equal to the number of possible actions for the control knob (e.g., 4 for number of cores). We also note that the proposed approach is not dependent on any specific policy representation and other approaches can be used to implement the DRM policies.
Runtime policy selection. The choice of the DRM policy from the Pareto front depends on the user preference in terms of desired tradeoffs between target objectives, such as power and performance. For example, if the battery level is low, the user can specify that energy consumption has the highest priority. In this work, we present our results under the assumption that an interface to provide user preference about the importance of objectives exists.
VB PaRMIS and Baseline DRM Algorithms
PaRMIS. There are no critical hyperparameters to apply PaRMIS. We employed the no. of samples to compute the utility function in Equation 9 with =1. We ran PaRMIS for a maximum of 500 iterations and noticed that it converges in at most 300 iterations.
Reinforcement learning (RL). Prior work using RL has typically focused on optimizing a single objective function by defining an appropriate reward function [2, 10, 12]. Single objective RL algorithms can be extended to multiple objectives by using a linear combination all the objectives via scalarization as: = , where is the combined reward function, and and stand for reward function and scalarization parameter for th objective . We employ the reward functions for energy and execution time from recent work [10]. However, it is hard to design a reward function for the PPW objective. We run RL algorithm employed in recent studies [10] with different scalarization parameters to create the Paretofrontier DRM policies.
Imitation learning (IL). IL methods create an Oracle policy to optimize a given objective and then learn a policy to mimic its behavior [20, 12]. We employ the IL approach and Oracle policies for energy and PPW objectives from a recent work [12], which showed good results for optimizing energy with small or no performance penalty (i.e., very specific tradeoffs). As noted before, Oracle policies may not be optimal for some objectives such as PPW and for different tradeoffs. Similar to RL, we run IL by creating Oracle policies to optimize a linear combination of target objectives and obtain Paretofrontier DRM policies by varying the scalarization parameters.
Default governors. We also compare with the default governors in the system, i.e., ondemand, interactive, performance, and powersave. These governors provide a single point on the Paretofront since they are optimized for a single objective, such as power or performance. Nonetheless, it is crucial to compare with these baselines as they are implemented on millions of commercial platforms.
VC Quality of ApplicationSpecific Paretofront
In this section, we compare different DRM algorithms for each application separately for two objectives: execution time and energy. To this end, we compute Paretofrontier DRM policies using PaRMIS, RL, and IL approaches by running them on a single application and measure the quality of the resulting Paretofront. These results provide the bestcase scenario for each application and help us in analyzing how global DRM policies learned over all applications compare to applicationspecific DRM policies.
PHV metric. We employ the Pareto hypervolume (PHV) metric, which is commonly used to measure the quality of a given Pareto front [24]. PHV is defined as the volume between a reference point and the given Pareto front. We report the normalized PHV metric w.r.t the PHV of PaRMIS approach (higher the better).
Convergence of PaRMIS. Recall that PaRMIS is an iterative approach and we want to see the number of iterations required to converge to the uncovered Paretofront. Figure 2 shows PHV of the Paretofront vs. no. of iterations for Blowfish and Spectral benchmarks noting that other applications show similar or better convergence behavior. We can see that PHV improvement is significant in the initial iterations and converges in at most 300 iterations.
Energy consumption vs. Execution time Pareto front. Figure 4 shows the overall Paretofront for two representative benchmarks (Qsort and PCA) noting that we got similar results for all applications. Each marker in the figure corresponds to one policy from the Paretofrontier DRM policy set obtained by PaRMIS (dark red ), RL (black ), and IL (blue ), respectively. We make the following observations. 1) The Paretofront obtained by PaRMIS dominates those from both RL and IL. More specifically, PaRMIS creates DRM policies that improve both objectives when compared to RL and IL. Furthermore, PaRMIS creates policies that have a wider range of tradeoffs between energy and execution time. For example, the lowest execution time obtained by PaRMIS for the Qsort application is 1.2 s, while the lowest values for RL and IL are 1.6 s and 1.9 s, respectively. 2) Figure 4 also shows the tradeoff obtained by DRM policies of the four default governors. We can clearly see that the Paretofront obtained by PaRMIS dominates all of them significantly. The difference is especially visible for the performance governor that is optimized for minimizing the execution time. Even in this case, PaRMIS is able to provide a DRM policy that has both lower execution time and energy than the performance governor. In summary, these results show that PaRMIS creates DRM policies that provide significant improvements over both the default governors and stateoftheart machinelearning based DRM approaches.
PHV comparison. The data in Figure 4 offers an intuitive visualization of the Paretofronts obtained by each DRM approach. However, it does not allow a quantitative comparison of the Paretofront quality. The PHV metric allows us to compare the quality of different Paretofronts. For computing PHV, the reference point is chosen such that it has a higher execution time and energy than all points in the Pareto front. To allow comparison between different Paretofronts, the same reference point is used for all DRM approaches. Figure 4 shows the comparison of normalized PHV metric for all the 12 applications. The normalized PHV of both RL and IL is significantly lower than 1, which shows that they have a significantly lower PHV. For example, PaRMIS has 10% and 25% higher PHV than RL and IL for the PCA application. On an average, PaRMIS achieves 13% and 23% higher PHV compared to RL and IL, respectively. This shows that the quality of the Pareto front obtained by PaRMIS is consistently better than both RL and IL. Prior work [12, 10] has shown that IL is better than RL for specific tradeoffs between energy and execution time. However, IL performs worse than RL over the entire Paretofront because the Oracle policy for different tradeoffs is not optimal. RL and IL also suffer from drawbacks of linear scalarization due to its inability to explore nonconvex regions of the Pareto front [4]. These results show the key advantage of PaRMIS not requiring any effort from designers to get the optimized Paretofront.
VD Global vs. ApplicationSpecific Paretofrontier DRM policies
Applicationspecific policies do not scale as the number of applications available to the user grow in size. Moreover, not all applications are known at designtime. Therefore, DRM algorithms must learn global Paretofrontier DRM policies that are applicable to all the applications. To this end, we apply PaRMIS to design global Paretofrontier policies using training data from all 12 applications.
Figure 5 shows the normalized PHV for all the applications. The PHV is normalized with respect to the PHV of applicationspecific Paretofront. As expected, the normalized PHV of the global Paretofrontier policies is within 2% of the applicationspecific policies. For FFT, Qsort, and StringSearch, the PHV of the global Paretofrontier policies is higher than the applicationspecific Paretofrontier policies. On an average, the PHV of global and applicationspecific Paretofrontier policies are equal. In summary, the global Paretofrontier policies achieve comparable or better quality than applicationspecific policies while generalizing to all applications.
VE Evaluation with Complex Objectives
One of the main advantages of the PaRMIS approach is that it can be easily applied with any set of complex objectives desired by the designers. Recall that this is not possible with RL and IL as it is hard to design a good reward function and optimal Oracle policy respectively for complex objectives such as PPW. To demonstrate this advantage, we use PaRMIS to optimize PPW and execution time for each application. However, we cannot use RL and IL to optimize PPW and execution time as PPW is a complex, nonlinear objective. There is no reward function and optimal Oracle policy for PPW objective [13]. Due to these limitations, we reuse the Paretofrontier DRM policies for energy and execution time from RL and IL, and compute the Paretofront and PHV for PPW and execution time objectives. Figure 7 shows a comparison of the Pareto fronts obtained by PaRMIS, RL, and IL for Basicmath and Dijkstra applications. The Pareto front achieved by PaRMIS dominates those from RL and IL both in terms of the range of policies and quality of individual Pareto points. PaRMIS is also able to dominate the default governors available on the platform. A similar behavior is seen for the normalized PHV metric, as shown in Figure 7. PaRMIS has a higher PHV for all the applications with an average improvement of 16% and 21% over RL and IL, respectively. These results show that PaRMIS can be easily extended to any new and complex objectives.
Metric  Per Policy  Total  % Overhead 

Exe. time  200 s  800 s  0.8 (every 100 ms) 
Memory  1 KB  27 KB  0.001 
VF Implementation Overhead
The DRM policies in each approach are implemented as userspace governors in software to characterize the overhead. Furthermore, all learningbased approaches including PaRMIS, RL, and IL use the same MLP function with different set of parameters to represent each DRM policy in the userspace governor. Hence, the storage cost and decisionmaking time for each policy is same for all three methods. In particular, contrary to existing implementation that employs look up table for RL [10], we use the same function approximator to implement both RL and IL. Hence, there is no computational and storage difference between IL, RL, and PaRMIS. Table II provides a summary of all overheads. On an average, per decision execution of a DRM policy to choose the runtime configuration takes about 800 s (200 s for each knob), which amounts to about 0.8% overhead when DRM decisions are made every 100 ms. The memory required to store a single DRM policy from all three methods (PaRMIS, RL, and IL) is 1 KB. When we employ global Paretofrontier policies, PaRMIS creates 27 policies that form the Pareto front, resulting in 27 KB storage overhead (0.001% with 2 GB RAM available on the SoC platform). At runtime, we choose one DRM policies from this set of 27 policies as per the desired tradeoff. In summary, the overhead in terms of storage and DRM decisionmaking time is negligible.
Vi Conclusions and Future Work
Dynamic resource management (DRM) of mobile SoCs is a challenging problem due to rise of heterogeneity, large state space and decision space, and complexity of application workloads. This paper studied a novel informationtheoretic learning framework referred to as PaRMIS to create Paretofrontier DRM policies. PaRMIS can produce highquality DRM policies and easy to configure/apply to tradeoff any set of complex design objectives. Experiments on a commercial heterogeneous SoC platform show that PaRMIS achieves Paretofronts that have 13% and 23% higher Pareto hypervolume (PHV) compared to stateoftheart RL and IL methods, respectively. Immediate future work includes studying PaRMIS for largescale manycore systems.
Acknowledgements. This work was supported in part by the NSF grants CNS1955353, OAC1910213 and IIS1845922, in part by the ARO grants W911NF1710485 and W911NF191 0162, and in part by semiconductor research corporation’s AI Hardware program.
References
 [1] A. Aalsaud et al., “Power–Aware Performance Adaptation of Concurrent Applications In Heterogeneous ManyCore Systems,” in ISLPED, 2016.
 [2] Z. Chen et al., “Distributed Reinforcement Learning For Power Limited ManyCore System Performance Optimization,” in DATE, 2015.
 [3] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2012.
 [4] I. Das et al., “A Closer Look at Drawbacks of Minimizing Weighted Sums of Objectives for Pareto Set Generation in Multicriteria Optimization Problems,” Structural optimization, vol. 14, no. 1, pp. 63–69, 1997.

[5]
K. Deb et al.
, “A Fast and Elitist Multiobjective Genetic Algorithm: NSGAII,”
IEEE TEC, vol. 6, no. 2, pp. 182–197, 2002.  [6] U. Gupta et al., “DyPO: Dynamic ParetoOptimal Configuration Selection for Heterogeneous MpSoCs,” ACM TECS, 2017.
 [7] M. R. Guthaus et al., “MiBench: A Free, Commercially Representative Embedded Benchmark Suite,” in Proc. WWC4, 2001, pp. 3–14.
 [8] Hardkernel. (2014) Odroidxu3. https://www.hardkernel.com/shop/odroidxu3/ Accessed 11/20/2020.
 [9] D. Kadjo et al., “Towards Platform Level Power Management In Mobile Systems,” in Int. Syst.onChip Conf. (SOCC), 2014, pp. 146–151.
 [10] R. Kim et al., “Imitation Learning For Dynamic VFI Control In LargeScale Manycore Systems,” IEEE TVLSI, vol. 25, no. 9, 2017.
 [11] R. Kumar et al., “Heterogeneous Chip Multiprocessors,” Computer, vol. 38, no. 11, pp. 32–38, 2005.
 [12] S. K. Mandal et al., “Dynamic Resource Management of Heterogeneous Mobile Platforms via Imitation Learning,” IEEE TVLSI, 2019.
 [13] S. K. Mandal et al., “An EnergyAware Online Learning Framework for Resource Management in Heterogeneous Platforms,” ACM TODAES, vol. 25, no. 3, pp. 1–26, 2020.
 [14] J. V. Michalowicz, J. M. Nichols, and F. Bucholtz, Handbook of differential entropy. Chapman and Hall/CRC, 2013.
 [15] T. S. Muthukaruppan et al., “Hierarchical Power Management For Asymmetric MultiCore In Dark Silicon Era,” in DAC, 2013.
 [16] V. Pallipadi and A. Starikovskiy, “The Ondemand Governor,” in Proc. Linux Symp., vol. 2, 2006, pp. 215–230.
 [17] J.G. Park et al., “MLGov: A Machine Learning Enhanced Integrated CPUGPU DVFS Governor For Mobile Gaming,” in Proc. of ESTIMedia, 2017, pp. 12–21.
 [18] A. Rahimi and B. Recht, “Random Features for Largescale Kernel Machines,” in NeurIPS, 2008, pp. 1177–1184.
 [19] B. K. Reddy et al., “Intercluster Threadtocore Mapping and DVFS on Heterogeneous Multicores,” IEEE TVLSI, vol. 4, no. 3, 2018.
 [20] A. Sartor et al., “HiLITE: Hierarchical and Lightweight Imitation Learning for Power Management of Embedded SoCs,” IEEE CAL, vol. 19, no. 1, pp. 63–67, 2020.
 [21] Statista, “Mobile App Usage  Statistics & Facts,” https://www.statista.com/topics/1002/mobileappusage/ Accessed 24 Nov. 2018.
 [22] S. Thomas et al., “CortexSuite: A Synthetic Brain Benchmark Suite.” in IISWC, 2014, pp. 76–79.
 [23] C. K. Williams and C. E. Rasmussen, Gaussian processes for machine learning. MIT Press, 2006, vol. 2, no. 3.
 [24] E. Zitzler, Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications, 1999, vol. 63.
Comments
There are no comments yet.