Multi-objective Optimisation of Digital Circuits based on Cell Mapping in an Industrial EDA Flow

05/21/2021
by   Linan Cao, et al.
0

Modern electronic design automation (EDA) tools can handle the complexity of state-of-the-art electronic systems by decomposing them into smaller blocks or cells, introducing different levels of abstraction and staged design flows. However, throughout each independent-optimised design step, overhead and inefficiency can accumulate in the resulting overall design. Performing design-specific optimisation from a more global viewpoint requires more time due to the larger search space, but has the potential to provide solutions with improved performance. In this work, a fully-automated, multi-objective (MO) EDA flow is introduced to address this issue. It specifically tunes drive strength mapping, preceding physical implementation, through multi-objective population-based search algorithms. Designs are evaluated with respect to their power, performance and area (PPA). The proposed approach is capable of expanding the design space, offering a set of Pareto-optimised trade-off solutions for different case-specific utilisation. We have applied the proposed MOEDA framework to ISCAS-85 benchmark circuits using a commercial 65nm standard cell library. The experimental results demonstrate how the MOEDA flow enhances the solutions initially generated by the standard digital flow, and how simultaneously a significant improvement in PPA metrics is achieved.

READ FULL TEXT VIEW PDF

Authors

page 1

05/21/2021

Multi-objective Digital Design Optimisation via Improved Drive Granularity Standard Cells

To tackle the complexity of state-of-the-art electronic systems, silicon...
09/01/2020

Max-value Entropy Search for Multi-Objective Bayesian Optimization with Constraints

We consider the problem of constrained multi-objective blackbox optimiza...
06/06/2022

Automated Circuit Sizing with Multi-objective Optimization based on Differential Evolution and Bayesian Inference

With the ever increasing complexity of specifications, manual sizing for...
12/27/2020

Constrained optimisation of preliminary spacecraft configurations under the design-for-demise paradigm

In the past few years, the interest towards the implementation of design...
07/18/2018

Cross-layer Optimization for High Speed Adders: A Pareto Driven Machine Learning Approach

In spite of maturity to the modern electronic design automation (EDA) to...
10/25/2007

New Perspectives and Opportunities From the Wild West of Microelectronic Biochips

Application of Microelectronic to bioanalysis is an emerging field which...
10/11/2019

Spacecraft design optimisation for demise and survivability

Among the mitigation measures introduced to cope with the space debris i...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The process of building a digital integrated circuit using blocks or cells from a foundry is a common and mature approach in modern digital VLSI design. A comprehensive industry-standard digital flow is available to tape out digital chips. Technology down-scaling enables high-density integrated circuits and the EDA tools therefore need to handle a large quantity of cells during the flow. To find possible optimal trade-off solutions with regard to PPA, using appropriate library cells while consuming less turnaround time, is the challenge of design optimisation [1].

Standard cell libraries typically contain a large number of functions and each function has multiple cells differing in drive strength. This enables different combinations of logic functions or drive strengths depending on the design specifications and the required loads in circuit paths. The possible design space is thus huge and complex because a circuit might be composed of millions of gates. Different combinations of gates thus can directly determine the PPA of a circuit.

Such complexity and challenges bring with more constraints and limits on a single design that are required to meet multiple design requirements simultaneously. This can lead to the rise of design optimisation difficulty which may be beyond what experienced engineers can manually handle. Automatic efficient design space exploration (DSE) approaches promise to balance multiple design objectives. Researchers both from academia and industry have focused on investigating design space in the digital flow or up to system level, and applying optimisation in the flow. A number of techniques have been adopted such as heuristic techniques 

[2]

,machine learning 

[3, 4, 5], design-parameter tuning [6, 7, 8].

Population-based metaheuristic optimisation algorithms like multi-objective evolutionary algorithms (MOEAs) are widely-used existing techniques that can efficiently perform design space exploration and ultimately find a set of Pareto-optimised solutions. Many publications exist on applying evolutionary algorithms (EAs) or genetic algorithms (GAs) to VLSI design optimisation from the system level down to the physical level, which also includes optimisation and design space exploration on individual design levels, such as standard cell library depletion 

[9], macro-cell placement and optimisation [10], gate-sizing-based soft error optimisation [11], netlists partitioning [12], circuit equivalence checking [13] and system-on-chip (SoC) design space optimisation [14] [15]. An automated multi-objective optimisation flow crossing different design levels from a global point of view is needed to recover performance which may otherwise lost in generic overheads. However, limited research focuses on investigating multi-objective evolutionary optimisation techniques fully integrated into state-of-the-art commercial digital EDA flows. This paper proposes a population-based evolutionary searching approach to balance the optimisation among multiple objectives through refining drive strengths of logic gates and applies it to the standard digital flow to enhance the design solution in the loop.

The main contributions of this work are summarised as follows: 1) A multi-objective (MO) EDA optimisation framework, fully-compatible with an industrial digital flow from logic synthesis to physical implementation. 2) Global tuning of standard cell drive strength mapping using parameterised gate-level circuit netlists. 3) Improved coverage of the feasible design space providing a set of Pareto-optimised solutions. 4) Enhanced trade-off design solutions with improved PPA.

The paper is structured as follows: Section II gives an overview of related work. Section III introduces the proposed MOEDA design flow. Experiment setup is described in Section IV. Section V presents results using a reduced cell library for testing and a commercial full cell library. Section VI presents the analysis of standard-flow-generated design space and the multi-objective optimisation results. Section VII provides conclusions and future work.

Ii Related Work

Ii-a Design Flow Modifications

Modern digital IC design flow is a mature EDA process including various steps from register-transfer level (RTL) design, logic synthesis to physical layout implementation. As each step introduces its own level of abstraction (e.g. from cells to functions, from functions to blocks), any margin or error introduced will therefore accumulate and propagate. Hence, achieving a good solution in each step is crucial for the success of subsequent design steps and the quality of the overall solution. In addition to the abstraction error, margins may be introduced in each step to speed-up evaluation at the cost of optimal performance. Furthermore, standard cell libraries from a foundry do not allow transistor resizing or cell layout modifications when they are used in the digital flow. These limits may prevent EDA tools make full use of the full capability of a process technology.

In previous work [16], we introduced a customised multi-objective auto-design flow to instrumenting parametric physical layout. This optimisation flow exploited a method to adjust cell drive strengths using a scripted parametric layout template and aimed to achieve improved delay, energy and area.

D. Chinnery stated that there is a gap between full-custom design and standard digital flow in terms of speed and power [17] [18] in the 2000s. Digital ICs implemented using the standard design flow may significantly reduce design cycle time but have lost possible optimal trade-off solutions, which full-custom design can achieve. But designers in industry still focus on synthesis-centred methodology to save design efficiency due to the nowadays time-to-market pressure.

Implementing extra custom design and optimisation techniques compensating to the standard digital flow can achieve better results [19]. W. Dally proposed to selectively apply a number of custom design techniques in the digital flow, including custom floor-planning, place and route critical signals to achieve the most compact layout structure [20]. To accelerate custom design, [21] introduced an ASIC design methodology with on-demand library generation in the digital flow producing cells with tailored drive strengths from a set of symbolic layouts.

Ii-B Design Space Exploration using Standard Digital Flow

Optimisation using steps of a commercial EDA design flow in the loop can be viewed as black-box design space exploration. While many of the algorithms used in commercial EDA flows are proprietary and not accessible by end users, logic synthesis and physical design tools provide a range of parameters and optimisation options for designers to choose from such as logic decomposition approach, area constraints, synthesis effort level, place and route with timing or power optimisation, etc. These parameters can be tuned with an optimisation or machine learning approach to fully utilise the optimisation potential that the tools are capable of.

A. Kahng presented in [8] that there is unpredictable “noisiness”

in tool-generated solutions causing variability in the resulting PPA metrics, and a probability theory was applied in a fully-automated digital flow, which aims to determine the optimal utilisation (parameter settings) of EDA tools to 

“de-noise” the design results. In [7], an automated method to explore the search space via tuning parameters at the synthesis step for multi-objective optimisation in a rank-based iterative process is proposed.

Running through the whole design flow leads to more computing resource consumption. In [6], an automated selection mechanism based on searching the design space in parallel while pruning non-competitive solutions at early stage is exploited, rather than propagating through the entire design flow. In [4], machine learning approaches were employed to bridge the synthesis solution space to the physical solution space, with the goal to enable Pareto-driven exploration for high speed and power efficient adder designs.

Ii-C Discrete Gate Sizing for PPA Optimisation

Gate sizing is a crucial step for achieving timing closure and power minimisation of integrated circuits (IC). It refers to determining transistor widths inside of logic gates to make designs meet constraints. Modern digital EDA flows synthesize designs using a range of pre-designed cells. The optimisation problem thus is shifted to focusing on cell selection, in respect to drive strengths and assignment, from discretised gate libraries.

A typical objective in gate sizing is to minimise power consumption while meeting the timing constraints [22]. Lagrangian Relaxation (LR) is a commonly used theory for gate sizing optimisation, which moves the timing constraints to the objective function weighted by Lagrange multipliers to penalise the overall results of the objective function. The problem is thus simplified to find the solution of weight factors.

In regard to the optimisation objectives in related research,  [23] derived LR associated with finding trade-offs between leakage power and circuit timing. [24] expanded the primal objective function (power minimisation) by adding the area objective using an extra weight factor. More recently,  [25]

considered more additional realistic constraints such as maximum load, maximum slew constraints of gates for simultaneous gate sizing and clock skew scheduling. However, the Lagrangian relaxation, it is typically formulated for continuous problems and might not naturally handle discrete gate-sizing 

[26].

There alternative multi-objective gate sizing frameworks, like geometric programming [27] [28], simulated annealing [29], have been investigated using weighted sum objective functions which is a common scalarizing method in multi-objective problems similar to the LR.

In [26], J. Hu proposed a different way to scalarise the objectives of leakage power and slacks into a sensitivity guided function for solution ranking (non-dominated), and a heuristic-based stochastic searching method was applied. The proposed method included two stages: global timing recovery seeks violation-free solutions by up-sizing gates and down-sizing threshold voltage, and then power reduction with feasible timing reduces leakage power on these gates that are oversized during the first stage.

However, limited works complete the gate sizing through industrial physical design flow and libraries to investigate how beneficial these methods can be in practice. In [30], it is stated that significant changes in cell sizes, after applying gate-sizing optimisation, require re-placement and re-routing for new wire load parasitics. Updating designs at post-routing stage can enable evaluation more accurate and robust under stringent timing constraints.

In earlier works, typical heuristic techniques like genetic algorithms were applied to solving gate sizing problems. The methods for multi-objective optimisation in [31] and [32] both are still based on scalarized cost functions. More recently, gate-sizing-based soft error optimisation using MOEAs is proposed in [11] but its multi-objectives are soft error rate, critical path delay and area.

Ii-D Summary

VLSI design is multi-objective in nature, often with a need to compromise between several conflicting design goals. A range of methods are developed including design flow revamping with custom-design techniques, intelligent approaches for design space exploration or dedicated design steps in EDA flows. In addition, scalarizing methods (e.g., a popular one, weighted sum function) are proposed to decompose the complexity of multi-objective problems due to its high search efficiency [33]. However, the device physics of ICs imply non-convexities and non-linearity [34] so that the weighted sum method is not sufficient to search for feasible Pareto-optimal solutions [33].

The ever-growing complexity of IC designs introduces additional design constraints like delay, power dissipation, die area, or even wire length minimisation, minimum number of vias, etc. The MOEAs, coping with multiple design parameters and objectives inter-independently, is useful particularly when designers are faced with a large, complex design space. The inherent parallelism of EAs allows a large number of concurrent diverse search tasks running in a single iteration. The MO feature is able to efficiently perform the design search and yield a set of design trade-off solutions.

Iii MOEDA Optimisation Framework

Iii-a Preliminaries: Evolutionary Algorithms

Evolutionary algorithms are a class of population-based metaheuristic optimisation algorithms using mechanisms inspired by biological evolution like reproduction, genetics and natural selection. An initial population, which consists of individuals (candidate solutions), is allowed to age with evolutionary cycles ( generations). The is so called the population size. The initial population can be either initialised randomly or seeded with a set of specific configurations. During each generation, individuals can be altered through operating mutation or crossover (i.e., recombination with each other) upon their chromosomes. All individuals are evaluated using a fitness score at the end of each generation. Only the fittest individuals survive the selection process for the subsequent generation. Termination of the evolution process is triggered when specific criteria are met, e.g., sufficient quality of solution or maximum number of generations.

Applying EAs to solve optimisation problems needs three main preparatory steps:

(1) Definition of representation. This is the data structure that the EA manipulates. It represents individuals as a set of genes, the chromosome, comprising all variables and parameters necessary to describe it.

(2) Implementation of genetic operators. Mutation and crossover are commonly applied during the evolution process. Mutation modifies genes of individuals, and crossover combines subsets of genes of multiple individuals to produce new ones.

(3) Definition of a fitness function. This is used to calculate a fitness score for each individual based on its performance in regard of design objectives. The fitness scores are used during the ranking and selection process to determine which individuals survive to form the population for the next generation.

In this work, NSGA-II [35], one of most popular multi-objective EAs, has been adapted as the searching tool. The fast non-dominated sorting approach and diversity preservation strategies used ensure convergence while achieving a uniform spread of Pareto-optimal solutions.

Non-dominated sorting. If one individual performs better than another in at least one objective while not performing worse in any other objectives, then is said to dominate  [36]. In non-dominated sorting, each individual (e.g., ) has two entities: the first is domination count, the number of solutions that dominate ; the second is a set of solutions that dominates. The individuals are grouped based on their domination count into multiple fronts . The non-dominated individuals which have the most domination counts form the first front . The individuals which have the second most domination counts form the second front and this will continue to the third and following fronts until all individuals are assigned.

Diversity Preservation. This crowding distance sorting

algorithm estimates the solution density in the vicinity of each individual based on the Euclidean distance to their nearest neighbours 

[37]. It mainly has two steps: the first is to calculate the distance of each individual to others, and assign the value to each individual; the second is to decendingly re-sort the according to their distance values. So that if two individuals belong to the same non-dominated front, the one that resides in the less crowded region is preferred.

Fig. 1: MOEDA Digital Flow. The flowchart on the left side is the standard digital flow and on the right side the MO extension is shown.

Iii-B Multi-objective (MO) EDA Digital Flow

The MOEDA digital flow, illustrated in Figure 1, is a fully-automated multi-objective design framework using compatible with an industrial digital flow. The industrial flow is tapped between the logic synthesis and the physical implementation stage, where the MO evolutionary optimisation loop is inserted. The novelty here lies in the additional level of abstraction that can automatically fine-tune drive strength mapping during the process of the flow. The proposed flow involves:

(1) Parametric netlist. Synthesised netlists are composed of technology-specific logic gates and their connectivity. The MOEA representation encodes the drive strengths of gates into a set of genes, in this case a list of strings (i.e., instance names), defining each gate function and its drive strength. This information is used to produce a parametric netlist from the synthesis results.

2) MOEA seeding. In this work, initial populations are seeded from a set of solutions obtained from the synthesis tool. This is achieved by converting the output netlists from the standard tool to parametric netlists, allowing the MOEA to modify them. Modifications are based on a library containing all drive strength options of each functional gate that are available from a standard cell library.

3) Genetic operations.

Only mutation operator is used in this work. The mutation operation modifies the drive strength of components based on a given probability

. This results in a new netlist, which is then ready for physical implementation and evaluation. With the pressure to promote beneficial mutations and discard the others, the evolutionary loop continues to keep producing increasingly optimised solutions.

4) Evaluation.

This is to calculate the fitness scores of each individual. MOEA-optimised netlists are propagated to place and route in the physical implementation step, producing layout instances for accurate evaluation metrics. In this work, three objectives are used (i.e., worst case delay

, total consumption power and area of all logic gates ), and fitness scores are evaluated at post-route stage from the EDA tool. Fitness scores are then fed back to the MOEA for ranking and selection.

The optimisation goal in this work is to simultaneously minimise , and so the fitness function is:

(1)
s.t.

where the EA representation vector

(chromosome) is the input variables of the fitness functions which are drive strengths of gates from a gate library .

Figure 2 demonstrates a population example where consists of layout individuals . Each has a chromosome vector comprising of a set of genes . Each single represents the drive strength of a logic gate type. When mutation is triggered, it will firstly identify the gate function and then perform an online look-up to achieve the all drive strength options of this gate function from , and finally select one from them to replace the previous one. Such the encoding method for EA representations is generic to easily cope with different cell libraries.

Fig. 2: A chromosome example of an individual in a population and how each gene is mutated using a logic gate library. “D” represents the drive strength.

Due to the parallel nature of the population-based MOEA, a number of circuit evaluations can run simultaneously. The adapted NSGA-II algorithm is explained in Algorithm 1.

Procedure: NSGA-II(, , ). individuals evolved generations to solve .

1:  Initialize parent population in size
2:  Offspring population Mutation()
3:  for  to  do
4:     for each population in size  do
5:        Fitness evaluation
6:         Non-Dominated-Sorting()
7:         Ø
8:        
9:        while  do
10:           Crowding-Distance-Assignment()
11:           
12:           
13:        end while
14:         Descend-Sort()
15:         Less crowned individuals from the first to the th of to fill .
16:         Mutation()
17:     end for
18:  end for
Algorithm 1 Adapted NSGA-II for MOEDA [35]

The optimisation process is continuously producing different circuit layout instances by adjusting the netlists and keeping improved solutions generation-by-generation. This ultimately achieves a set of wide spread Pareto-optimised trade-offs in regard to the objectives.

Iv Experimental Setup

We implement the proposed algorithm in C++ and conduct the proposed MOEDA design flow experiments on a 2.2GHz Xeon E5-2650 CPU. The ISCAS-85 benchmark circuits [38] are implemented and optimised using the Cadence® digital flow suite. Benchmark circuits in the form of RTL designs are synthesised into gate-level netlists using GenusTM (v17.11) [39]. These netlists are then optimised using the proposed flow in tandem with the physical implementation tool InnovusTM (v17.11) [40] to generate the layouts from the optimised netlists.

All experiments are using a TSMC 65nm low-power core cell library (TCBN65LP) in standard threshold voltage containing about 400 combinational cells.

Iv-a Tool Environment Setup

The MOEDA flow is applied to further enhance designs which are already well-optimised by the Cadence® tools. In order to take advantage of the GenusTM synthesis tool as much as possible, it is necessary to push it to the limit of what it can achieve with the user options available. Hence, the synthesis compile effort is set to high and ultra optimisation is enabled. Apart from that, each benchmark is repeatedly synthesised, tightening its timing constraint bit-by-bit until it fails timing. The last working solution before timing failure is the best in speed, delay or slack that the tool can achieve. This solution is then chosen as a seed for initialising the MOEA.

In the timing constraint setup, we create an ideal general clock for all inputs and outputs, which means all paths are clocked with two ideal flip-flops at the beginning and the end of each path. The path arrival time () should be less than the clock period () to the meet the timing constraint. The benchmarks used are all combinational circuits, so that the ideal clock was not applied with any uncertainties or transition delays.

To tighten the timing constraint, the output delay constraint () is gradually increased, as shown in Figure 3. So the required time () is:

(2)
Fig. 3: Timing Constraints

The settings of both synthesis step and physical implementation step are summarised in Table I.

Synthesis Setup Place & Route Setup
syn_generic_effort = high aspect ratio = 1.0
iopt_ultra_optimisation = true core utilisation = 0.7
timing-driven placement = true
timing-driven routing = true
SI-driven routing = true
TABLE I: Tool Settings in Digital Flow

The output load capacitance (set_load) is also specified in part of the following experiments. In the physical design flow, all die area is shaped in the ratio of 1.0, and core utilisation is 70%. Timing-driven placement & routing and signal integrity (SI) driven routing are enabled for better performance.

Iv-B Objective Evaluation in Tools

Evaluation takes place after place-and-route with InnovusTM as follows:

, worst case which is the value of minus the worst negative slack (WNS) amongst all path delays. Static timing analysis is performed at the post-route stage.

, which is the result from the average power analysis in InnovusTM. This is an approach that estimates the switching activity of the circuit without running a costly detailed simulation. It includes three parts: Switching power consumed in the charging and discharging of interconnect and load capacitance; Internal power consumed in charging and discharging of interconnect and device capacitance internal to cells; Leakage power consumed by devices when not switching. Both internal and leakage power are calculated based on power tables from the Liberty (.lib) file, which contains the specifications and characterisations of the standard cells. Switching power is calculated based on the equation , where is the output capacitive loading, is the voltage, is frequency, and is the average switching activity (the value used in this work is the default from InnovusTM).

, which is calculated by adding the areas of each single gate used. This is directly reported by InnovusTM.

All evaluations above are performed on a single mode under typical corner conditions (PVT: TT, 1.2V, 25C).

Iv-C Multi-threads Running and Runtime

According to the computing resources and licenses, we manipulate all experiments in this work with running 24 MOEDA threads in parallel.

The multi-objective approach requires a larger number of evaluations, which increases the runtime of the algorithm. The majority of runtime is spent on completing place and route in this case. However, due to the embarassingly parallel nature of the population-based approach, this can be overcome using high-performance computing (HPC) resources. In addition, the MOEDA algorithm delivers a set of trade-off solutions spanning the feasible design space in one go, rather than a single solution.

, , , , set_load=0
Test Case () #Syn Gates Syn-Opt. MOEDA Solution
#Genes Solution Best () Best () Best () Trade-off ()
C17 (0.10) 10 : 0.092 0.084 (8.7%) 0.090 0.090 0.090 (2.2%)
3 : 1.324 0.900 0.856 (35.3%) 0.856 0.856 (35.3%)
: 18.72 14.04 13.68 13.68 (26.9%) 13.68 (26.9%)
C432 (1.50) 316 : 1.459 1.388 (4.9%) 1.444 1.453 1.417 (3.6%)
138 : 37.81 37.61 35.40 (6.4%) 35.55 35.80 (4.7%)
: 401.76 401.40 386.64 385.20 (4.1%) 387.00 (2.7%)
C499 (1.20) 650 : 1.167 1.104 (5.4%) 1.167 1.162 1.143 (2.1%)
214 : 112.2 111.4 105.2 (6.2%) 105.9 105.9 (5.6%)
: 944.64 942.84 883.08 880.92 (6.7%) 884.52 (6.4%)
C880 (1.10) 674 : 1.019 0.980 (3.8%) 0.993 1.014 0.993 (2.6%)
243 : 89.44 87.89 86.49 (3.3%) 87.32 86.49 (3.3%)
: 875.16 874.08 873.00 871.92 (0.4%) 873.00 (0.2%)
C1355 (1.30) 669 : 1.201 1.159 (3.5%) 1.194 1.194 1.194 (0.6%)
224 : 109.5 0.1093 105.1 (4.0%) 105.1 105.1 (4.0%)
: 929.88 929.52 893.88 893.88 (3.9%) 893.88 (3.9%)
C1908 (1.30) 366 : 1.281 1.191 (7.0%) 1.248 1.273 1.231 (2.9%)
168 : 86.43 84.10 81.19 (6.1%) 83.13 81.77 (5.4%)
: 749.52 729.00 711.36 709.56 (5.3%) 713.52 (4.8%)
C2670 (1.00) 948 : 0.988 0.938 (5.1%) 0.967 0.969 0.943 (4.6%)
314 : 136.0 136.0 133.0 (2.2%) 135.2 134.3 (1.3%)
: 1248.12 1246.68 1247.4 1239.84 (0.7%) 1245.6 (0.2%)
C3540 (2.00) 1311 : 1.890 1.809 (4.3%) 1.809 1.809 1.809 (4.3%)
478 : 275.5 268.0 268.0 (2.7%) 268.0 268.0 (2.7%)
: 1706.04 1701 1701 1701 (0.3%) 1701 (0.3%)
C5315 (1.40) 2075 : 1.359 1.319 (2.9%) 1.354 1.354 1.319 (2.9%)
632 : 318.7 314.0 311.4 (2.3%) 314.3 314.0 (1.5%)
: 2723.4 2719.44 2718.72 2709.72 (0.5%) 2719.44 (0.15%)
C6288 (4.50) 4221 : 4.478 4.296 (4.2%) 4.369 4.39 4.296 (4.2%)
1403 : 1946 1915 1911 (1.8%) 1925 1915 (1.6%)
: 5270.4 5270.04 5270.04 5268.6 (0.3%) 5270.04 (0.0%)
C7552 (1.85) 2403 : 1.700 1.652 (2.8%) 1.684 1.691 1.652 (2.8%)
753 : 438.0 435.1 434.5 (0.8%) 435.7 435.1 (0.7%)
: 3090.24 3087.36 3087.36 3086.64 (0.1%) 3087.36 (0.09%)
Units: [] [] []
TABLE II: MOEDA design flow using the reduced library
, , , set_load=0
Test (#) #Syn Gates Syn-Opt. MOEDA Solution
Case #Genes Solution Best () Best () Best ()
C1908 0.60 299 : 0.580 0.569 (1.9%) 0.580 0.580
299 : 222.9 221.9 211.0 (5.3%) 211.0
: 1452.96 1451.88 1388.16 1388.16 (4.5%)
0.76 178 : 0.697 0.687 (1.4%) 0.688 0.696
178 : 111.1 107.9 107.5 (3.2%) 0.1078
: 698.04 682.92 682.2 678.96 (2.7%)
1.50 105 : 1.263 1.234 (2.3%) 1.249 1.251
105 : 42.1 39.69 39.32 (6.6%) 39.51
: 344.52 344.52 343.08 342.72 (0.5%)
C5315 0.74 750 : 0.723 0.706 (2.4%) 0.715 0.72
750 : 472.9 470.5 458.9 (3.0%) 461.2
: 2762.64 2755.44 2729.16 2724.48 (1.4%)
0.88 516 : 0.824 0.805 (2.3%) 0.819 0.823
516 : 310.9 309.0 304.6 (2.0%) 305.7
: 1873.44 1869.48 1859.76 1852.56 (1.1%)
1.50 400 : 1.305 1.241 (4.9%) 1.289 1.302
400 : 225.2 222.3 217.4 (3.5%) 220
: 1346.76 1343.52 1343.16 1336.68 (0.8%)
C6288 2.34 2178 : 2.225 2.204 (0.9%) 2.206 2.204
2178 : 5509 5495 5481 (0.5%) 5495
: 9382.32 9364.68 9377.28 9364.68 (0.2%)
2.90 1555 : 2.726 2.673 (1.9%) 2.708 2.708
1555 : 3829 3785 3732 (2.5%) 3732
: 6363.00 6331.32 6278.76 6278.76 (1.3%)
4.00 1140 : 3.591 3.528 (1.8%) 3.59 3.585
1140 : 2824 2821 2754 (2.5%) 2777
: 4194.00 4191.84 4183.92 4137.48 (1.3%)
Units: [] [] []
TABLE III: MOEDA design flow with using the full commercial library

V Multi-objective Optimisation Experiments

V-a Initial Experiments with a Reduced Library

The first step is to quickly evidence the argument that the proposed optimisation method is capable of enhancing designs in the context of a standard digital design flow. To achieve that, initially circuits are implemented using a reduced set of standard cells only including two functions: a two input nand (ND2) and inverters (INV) which are taken from the TSMC library. The aim is to initially reduce the complexity of the problem and to make analysis of results simpler, before moving to the full real-world TSMC cell library. In terms of drive strengths of both functions, only the nand gate with the smallest drive strength D0 is included, but inverters feature 11 different drive strengths, including D0, D1, D2, D3, D4, D6, D8, D12, D16, D20 and D24. The nand gate is a universal gate capable of realising complete overall behavioural function and the various inverters can meet different drive strengths required in the different paths of a circuit.

At first, optimisation is performed exclusively on drive strengths of inverters. Synthesising with only one minimum nand gate is biasing the GenusTM tool towards using a large number of different inverters. This, in turn, creates a larger and richer search space for optimisation with the MOEDA.

In this experiment, all ISCAS-85 benchmark circuits have been implemented and optimised with our proposed MOEDA digital flow using a population size of 200 individuals. Designs are optimised over 400 generations for the largest circuits C5315, C6288 and C7552. For all others, the number of generations is 200. In addition, the output load constraints have not been applied in this case.

The netlist parameterisation is performed on the inverters of the synthesised netlist, converting their drive strength setting into genes for optimisation. The algorithm is seeded with the best tool-generated solution in terms of delay to provide a starting point where PPA over standard flow can be improved from the beginning of the search.

Table II presents each testing case with its required timing (), the total number of synthesised gates (# Syn Gates), and the total number of genes (# Genes) which are the inverters in this experiment. The Syn-Opt. column is the best solution (using Genus) of each circuit in terms of delay. The general clock is set to 250MHz in order to make the tool deliver working solutions for most of the benchmark circuits as timing constraints are easily met when they are first synthesised. The timing limit of each circuit is found by gradually tightening the timing constraints by 0.05 increments. However, the general clock of C6228 benchmark has to be lowered to 200MHz (5 clock period), because it fails the 4 clock period even without any further output delay constraints applied.

Under the MOEDA solution in Table II there are four columns, where the first three report solutions with the best scenarios in delay, power and area, respectively. Each solution is the best improvement on one of objectives that can be achieved strictly without worsening any others. Column four takes all objectives into account simultaneously and shows the optimised trade-off solution, which is defined here as the individual from the final generation that is positioned at the shortest Euclidean distance to the origin. The trade-off solutions demonstrate the optimisation capability of achieving improvements in all objectives simultaneously.

The result shows that each circuit can be improved, to varying degrees, by the MOEDA flow. The complexity (in terms of gate count and function) of the benchmarks shown in Table II increases from top to bottom on the table. It can be observed that PPA of the smaller circuits can be improved to a larger degree. The reason for this may be a result of the smaller design space allowing the optimisation to achieve results approaching an exhaustive search. For this reason, we are focusing on the larger benchmark circuits in subsequent experiments.

These initial experiments suggest that the MOEDA flow is a promising and viable approach to tackling multi-objective problems in a standard digital flow. However, with the constraint of only using two types of logic functions, i.e., nand and inverter, it can not make full use of all features of the EDA tool’s own optimisation algorithms or the process technology.

V-B Experiments with a Full Commercial Library

We scale up the proposed MOEDA flow to optimise designs using the full TSMC library. Instead of only adjusting the drive strength of inverters, the designs are synthesised using the full library and the MOEDA is handling drive strength optimisation for all types of cells.

Three large benchmark circuits with different functions and structures are optimised for the same three objectives using the MOEDA flow. They are a 16-bit error detector/corrector (C1908), a 9-bit ALU (C5315) and a 16x16 multiplier (C6228).

In the previous experiment using the reduced library it was possible to efficiently explore the feasible design space starting from a single seed pushed to the timing limit of what the standard tools can achieve. This is no longer sufficient in this case, where the use of the full library causes a dramatic increase in both complexity of the design space and the behaviour of the greedy optimisation algorithms built into the standard flow. For this reason three different seeds are used here, which are obtained from running synthesis and implementation under three different timing constraints for each benchmark: the first is the most tight constraint that can just be met, resulting in a solution with the best delay. In the second case, the timing constraint is relaxed so that it can be easily met, allowing the standard flow room to optimise for power and area. The third timing constraint is chosen in the middle of the first and second. The three different solutions obtained from the standard flow when applying these constraints will be used as seeds for three independent runs of the MOEDA flow.

All circuits are optimised with running 200 generations with a population size of 200 individuals and output load constraints have not been applied in this set of experiments. The number of synthesised gates and the number of genes are the same shown in Table III because all gates are encoded into chromosomes, so that the MOEDA flow is optimising the drive strength of all gates. In terms of the number of synthesised gates in each circuit, it is much less than the number in original benchmarks where C1908 has 880 gates, C5315 has 2307 gates and C6288 has 2406 gates. It is the reason that the TSMC library has a large range of complex logic cells such as AOI (AND-OR-Inverter), IINR (NOR with 2 Inverted Inputs), full adders, etc., which may be already comprised of few basic simple logic gates like XOR, NAND, OR, etc. In contrast, original benchmarks used basic simple generic gates. So this makes the synthesis tool to automatically merge the simple gates into complex ones for the total transistor count and physical area reduction, so finally reduce the number of gates.

This may compact the design space and reduce the search complexity but still increase the difficulty of PPA extra optimisation. In the real-world TSMC library, complex logic cells have less options of drive strengths (normally no more than 5) due to the physical design complexity, and a large number of complex cells are used by tools evidenced by the significant decreasing in gate numbers. This may block the optimisation results for achieving huge improvements.

Under such complexity, the optimised results are still promising compared to the Syn-Opt. solutions which are achieved under the three different timing constraints from tools. MOEDA solutions demonstrate significant improvements achieved in all three cases while strictly not sacrificing metrics of other objectives.

In Figure 4, the final generation of each circuit with three independent seeding runs is shown, plotting “Delay vs. Power” (left column), “Delay vs. Area” (right column) and the corresponding Syn-Opt. reference solutions. The three clusters (, and ) correspond to the three seed timing constraints, listed in Table III. In all cases, the MOEDA produces a wide range of useful trade-off solutions, with reduced power consumption or area, within the boundaries of the given seed topology.

(a)
(b)
(c)
(d)
(e)
(f)
Fig. 4: MOEDA flow optimisation results using full commercial standard cell library for C1908, C5315 and C6288.

A number of solutions are improved in all objectives, although due to the Syn-Opt. solutions being well-optimised for delay already, the MOEDA did not have room for significant improvement, especially in clusters and . In particular for timing settings corresponding to clusters and the tools are operating under tight timing requirements, causing the synthesis tool to spend the most effort on timing closure and less on power and area. However, for relative relaxed timing settings corresponding to clusters , the tools did not make the solution trade-off on timing too much but spend more efforts on power and area. The MOEDA flow can further balance these three objectives while tools have not.

Furthermore, when comparing the plots of test cases to each other it can be seen that the clusters are relatively flat in the case of benchmark C6288, the 16x16 multiplier. It is a highly-structured circuit which uses a large number of adders resulting in the optimisation difficulty. This might be due to the fact that this circuit comprises too many adders, causing its critical paths to be close together, whereas for more irregular circuit structures like ALU and error detector/corrector, the MOEDA can trade-off solutions between objectives better.

V-C Discussion

Two sets of experimental results are shown in this section, one using a reduced commercial cell library and the other using the full version of this library. In both cases, it is demonstrated that the proposed MOEDA flow can optimise designs in the loop for better PPA through drive strength remapping. Significant improvements over what can be achieved with the standard flow are achieved with regard to all objectives.

The runtime for the largest case optimisation (C6228.a) needs 25 hours. Although the proposed optimisation method is at the cost of longer computing time, this investment will be worthwhile when considering the savings in power consumption and area that could not otherwise be achieved, particularly for feasible circuit solutions that are produced in large numbers. An advantage of NSGA-II is that it reliably converges to similar performance when run multiple times with the same evaluation budget. Hence, one MOEDA run is sufficient, which to some extent mitigates the runtime.

The second set of experiments using the full library confirms that circuit structure and topology produced by the synthesis tool play a major role in the overall design results (PPA) that can be achieved in subsequent steps of the design flow. In addition, optimising circuits for a given timing constraint with one circuit topology solution (single seed) is not capable enough to offer a larger design space when circuit structures are changing.

Therefore, the next section will investigate how running synthesis multiple times can be harnessed to expand, access and explore the design space with respect to different circuit topologies.

Vi Multi-objective Design Space Exploration

Vi-a Optimisation using Multiple Seed Designs

Instead of starting with seeding the initial population using a single synthesis-optimised solution, this section investigates how the proposed algorithm can explore the design space using a set of multiple different seeds. The seeds are a range of different solutions generated using the standard digital flow under a number of different timing constraints.

The methodology to obtain the set of seeds from the standard design tools is the same as before in Section V-B. However, in this case, a more fine-grained range of timing constraints are applied in 100 increments from minimum (a constraint that the tool can easily meet) to maximum (solutions start to fail timing) in order to investigate how the standard tools deal with the different constraints and what design space coverage they can achieve. The general clock is set to 250MHz. Each benchmark has been synthesised once for each timing constraint setting to generate the 100 solutions for seeding. Table IV summarises timing constraint settings of each test case including the number range of synthesised gate from minimum to maximum. Different output load scenarios, including loading with drive strength D1 and D8, are applied to all test cases under the same set of timing constraints. The output load values (D1 and D8) are specified as the input pin capacitance of inverters with drive strength D1 and D8 from the TSMC cell library.

Test Case (Increment Factor) set_load #Syn Gates
C1908 () D1 105 - 445
D8 105 - 468
C5315 () D1 396 - 1323
D8 401 - 1287
C6288 () D1 1105 - 3208
D8 1123 - 3222
TABLE IV: Timing Constraints Setup

Figure 5 illustrates the standard tool’s design space for each benchmark circuit under D1 and D8 output load scenarios in the first and third columns, and their respective optimised design space from the MOEDA flow in the second and fourth columns. From the first and third columns, all cross markers represent tool-generated solutions and their face colors correspond to the color bar relating to circuit area ranging from large (red) to small (blue). Solutions additionally marked with squares have failed to meet timing constraints. The red line highlights the Syn-Opt. “elite” solution front, which is calculated using the non-dominated sorting approach in three dimensions with regard to , and . All solutions in the first domination rank are connected with a line to highlight the “Syn-Frontier” more clearly. The Syn-Frontiers shown in the figures are projections from the 3D objective space onto the 2D plots.

Looking at the design space of the standard flow, it can be observed that the 16-bit error detector/corrector (C1908) and the 9-bit ALU (C5315) can be synthesised and optimised well by the tool as the set of solutions forms a smooth Pareto frontier. However, the 16x16 multiplier (C6288), which is a highly structured circuit, yields a less regular frontier with more clustered solutions. This indicates that the tools struggle to effectively trade-off multiple objectives when optimising a complex design with a relatively fixed circuit topology.

Vi-B Squeeze Design Space for PPA

The design space comprising the 100 seed solutions, in different circuit topologies, is the baseline for the MOEDA to perform optimisation on. All 100 seed solutions are loaded into the initial population of the MOEDA flow and optimised generation by generation. All test cases are optimised over 100 generations using a population size of 500, i.g., the initial population comprises five copies of each seed circuit. The plots in the second and fourth columns of Figure 5 show the improved solution space, plotting “Delay vs. Power”. The red line shows the Syn-Frontiers from the baseline design space. All those seed solutions that have survived until the last generation, although with modified drive strengths, are marked with a red cross. The solutions shown as blue crosses are those produced by the MOEDA flow comprising of all individuals of the final generation.

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i) Standard Flow
(j) MOEDA Flow
(k) Standard Flow
(l) MOEDA Flow
Fig. 5: Design space optimisation results under the drive strength D1 and D8 output load scenarios for C1908 16-bit error detector/corrector, C5315 9-bit ALU and C6288 16x16 multiplier. , , , set_load information is labeled in the title at the top of each plot.

The results confirm that the MOEDA flow can push the baseline frontier further to extend the design space of all test cases in all three objectives, through different circuit topoloies. In the case of circuit C1908, the optimised solutions that form a smooth Pareto frontier, whereas there are some gaps in the optimised design space of C5315 and particularly of C6288. The runtime of largest case (C6288.D8) is 26 hours.

The gaps are artefacts from the baseline design space due to limitations of the tool’s optimiser and properties of the circuit. Although the proposed MOEDA flow could not fully bridge these large gaps, it has been achieved that the optimised design space covers the baseline design space and beyond more uniformly. This enables to make better choices for design-specific using as a richer set of solutions is available.

Only about one-fourth of the initial seeds survive until the final generation. Most of the surviving seeds are positioned on the Syn-Frontier, while others have been discarded in the evolution process. This indicates that there is “noisiness” inside of standard flow tools and not all solutions generated by tools are presumably optimised, which might lose some well trade-offed solutions. This requires iterations with applying modifications in the design flow and is normally achieved by custom design efforts from designers. The MOEDA can auto-iterate designs without throughout the whole flow for better trade-offs in PPA.

The MOEDA flow needs more computing resources due to the continuous generation of design layouts. This aims for accurate and real-world evaluation. It is easily to speed up the flow through making design evaluations at earlier design stage, but what we are investigating in this work is comprehensively evidence the proposed MOEDA flow has generic optimisation capability in an industrial post-fabrication design environment.

In addition, We have a work in progress to investigate critical-path-driven optimisation for the MOEDA flow runtime acceleration to deal with extreme-large designs (e.g., millions gates).

Vi-C Discussion

The MOEDA flow achieves significant improvements on PPA over the standard design tool’s solutions across the entire design space with different circuit topologies. However, although the proposed method is capable of exploiting design opportunities to refine technology mapping by adjusting drive strengths at the gate-level, circuit topology optimisation is currently not yet included. This current limitation is likely the reason that design space gaps cannot be fully closed, which would provide the best trade-off design choices. This is particularly visible in the results for C6288, due to its fixed topology. From these results it can be concluded that including topology modification in our approach could enable further design optimisation opportunities.

Vii Conclusion and Future Work

This paper proposes a fully-automated multi-objective electronic design automation flow (MOEDA) extension to enhance the current industry-standard synthesis and physical implementation flow. The MOEDA flow is fully compatible with commercial design tools and specifically optimises drive strength of gates during technology mapping in such a way that the subsequent physical implementation stage can achieve designs with better PPA. The proposed method has been successfully applied to the optimisation of ISCAS-85 benchmark suite using the TSMC 65nm low power standard cell library.

Experimental results show that the proposed MOEDA flow has operated design optimisation gaining significant improvements on PPA over the standard design tool’s solutions. It can be concluded that optimising technology mapping to refine drive strength selection of cells is beneficial to improving PPA of circuits. This has not only been shown for a single solution, but across the entire design space with various circuit topologies.

From a designer’s point of view, the multi-objective optimisation approach has the added benefit of producing a set of best trade-off solutions which are as uniformly as possible distributed. This provides designers with choice and allows to select designs with the most appropriate objective trade-off for different applications.

Based on observations made from the results, we will investigate circuit topology adjustment and application-specific cell library drive strength composition. Providing mechanisms for circuit topology modification should make it possible to achieve more uniform best trade-off solution fronts in the case of complex circuits with rigid structure. A tool that can determine what cells would be most beneficial to have available in a cell library for a specific application could streamline the design and reduce cost of cell libraries.

References

  • [1] A. B. Kahng, J. Lienig, I. L. Markov, and J. Hu, VLSI physical design: from graph partitioning to timing closure.   Springer Science & Business Media, 2011.
  • [2] D. S. Rao and F. J. Kurdahi, “Hierarchical design space exploration for a class of digital systems,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 1, no. 3, pp. 282–295, 1993.
  • [3] S. Roy, Y. Ma, J. Miao, and B. Yu, “A learning bridge from architectural synthesis to physical design for exploring power efficient high-performance adders,” in 2017 IEEE/ACM Int. Symp. on Low Power Electronics and Design (ISLPED).   IEEE, 2017, pp. 1–6.
  • [4] Y. Ma, S. Roy, J. Miao, J. Chen, and B. Yu, “Cross-layer optimization for high speed adders: A pareto driven machine learning approach,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 12, pp. 2298–2311, 2019.
  • [5] J. Kwon, M. M. Ziegler, and L. P. Carloni, “A learning-based recommender system for autotuning design fiows of industrial high-performance processors,” in Proc. 56th Design Automation Conf. (DAC).   IEEE, 2019, pp. 1–6.
  • [6] M. Anwar, S. Saha, M. M. Ziegler, and L. Reddy, “Early scenario pruning for efficient design space exploration in physical synthesis,” in 2016 29th Int. Conf. on VLSI Design and 2016 15th Int. Conf. on Embedded Systems (VLSID).   IEEE, 2016, pp. 116–121.
  • [7] M. M. Ziegler, H.-Y. Liu, G. Gristede, B. Owens, R. Nigaglioni, and L. P. Carloni, “A synthesis-parameter tuning system for autonomous design-space exploration,” in 2016 Design, Automation & Test in Europe, (DATE).   IEEE, 2016, pp. 1148–1151.
  • [8] A. B. Kahng, S. Kumar, and T. Shah, “A no-human-in-the-loop methodology toward optimal utilization of eda tools and flows,” Proc. DAC, WIP Track, 2018.
  • [9] A. Ricci, I. De Munari, and P. Ciampolini, “An evolutionary approach for standard-cell library reduction,” in Proc. 17th ACM Great Lakes Symp. on VLSI.   ACM, 2007, pp. 305–310.
  • [10] H. A. Rahim, R. B. Ahmad, W. N. S. F. W. Ariffin, M. I. Ahmad et al., “The performance study of two genetic algorithm approaches for vlsi macro-cell layout area optimization,” in 2008 2nd Asia Int. Conf. on Modelling & Simulation (AMS).   IEEE, 2008, pp. 207–212.
  • [11] W. Sheng, L. Xiao, and Z. Mao, “Soft error optimization of standard cell circuits based on gate sizing and multi-objective genetic algorithm,” in Proc. 46th Design Automation Conf. (DAC), 2009, pp. 502–507.
  • [12] S. M. Sait, A. H. El-Maleh, and R. H. Al-Abaji, “Evolutionary algorithms for vlsi multi-objective netlist partitioning,”

    Engineering Applications of Artificial Intelligence

    , vol. 19, no. 3, pp. 257–268, 2006.
  • [13] Z. Vasicek and L. Sekanina, “A global postsynthesis optimization method for combinational circuits,” in 2011 Design, Automation & Test in Europe, (DATE).   IEEE, 2011, pp. 1–4.
  • [14] M. Palesi and T. Givargis, “Multi-objective design space exploration using genetic algorithms,” in Proc. 10th Int. Symp. on Hardware/software Codesign, 2002, pp. 67–72.
  • [15] G. Ascia, V. Catania, and M. Palesi, “A framework for design space exploration of parameterized vlsi systems,” in Proc. ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conf. and 15th Int. Conf. on VLSI Design.   IEEE, 2002, pp. 245–250.
  • [16] L. Cao, S. J. Bale, and M. A. Trefzer, “Instrumenting parametric physical layout for multi-objective optimisation,” in 2018 IEEE Symp. Series on Computational Intelligence (SSCI).   IEEE, 2018, pp. 1339–1345.
  • [17] D. G. Chinnery and K. Keutzer, “Closing the power gap between asic and custom: an asic perspective,” in Proc. 42nd Design Automation Conf. (DAC), 2005, pp. 275–280.
  • [18] ——, “Closing the gap between asic and custom: an asic perspective,” in Proc. 37th Design Automation Conf. (DAC), 2000, pp. 637–642.
  • [19] ——, “High performance and low power design techniques for asic and custom in nanometer technologies,” in Proc. 2013 ACM Int. Symp. on Physical Design, 2013, pp. 25–32.
  • [20] W. J. Dally and A. Chang, “The role of custom design in asic chips,” in Proc. 37th Design Automation Conf. (DAC), 2000, pp. 643–647.
  • [21] H. Onodera, M. Hashimoto, and T. Hashimoto, “Asic design methodology with on-demand library generation,” in 2001 Symp. on VLSI Circuits. Digest of Technical Papers.   IEEE, 2001, pp. 57–60.
  • [22] L. Lavagno, I. L. Markov, G. Martin, and L. K. Scheffer, Electronic Design Automation for IC Implementation, Circuit Design, and Process Technology: Circuit Design, and Process Technology.   CRC Press, 2016.
  • [23] G. Flach, T. Reimann, G. Posser, M. Johann, and R. Reis, “Effective method for simultaneous gate sizing and th assignment using lagrangian relaxation,” IEEE trans. on computer-aided design of integrated circuits and systems, vol. 33, no. 4, pp. 546–557, 2014.
  • [24] T. J. Reimann, C. C. Sze, and R. Reis, “Cell selection for high-performance designs in an industrial design flow,” in Proc. 2016 ACM Int. Symp. on Physical Design, 2016, pp. 65–72.
  • [25] A. Sharma, D. Chinnery, and C. Chu, “Lagrangian relaxation based gate sizing with clock skew scheduling-a fast and effective approach,” in Proc. 2019 Int. Symp. on Physical Design, 2019, pp. 129–137.
  • [26] J. Hu, A. B. Kahng, S. Kang, M.-C. Kim, and I. L. Markov, “Sensitivity-guided metaheuristics for accurate discrete gate sizing,” in Proceedings of the International Conference on Computer-Aided Design, 2012, pp. 233–239.
  • [27] A. Farshidi, L. Rakai, L. Behjat, and D. Westwick, “A self-tuning multi-objective optimization framework for geometric programming with gate sizing applications,” in Proc. 23rd ACM Great Lakes Symp. on VLSI, 2013, pp. 305–310.
  • [28] ——, “Optimal gate sizing using a self-tuning multi-objective framework,” Integration, vol. 47, no. 3, pp. 347–355, 2014.
  • [29] T. Reimann, G. Posser, G. Flach, M. Johann, and R. Reis, “Simultaneous gate sizing and vt assignment using fanin/fanout ratio and simulated annealing,” in 2013 IEEE Int. Symp. on Circuits and Systems (ISCAS).   IEEE, 2013, pp. 2549–2552.
  • [30] A. K. Yella, G. Srivatsa, and C. Sechen, “Are standalone gate size and v t optimization tools useful?” in 2017 IEEE 30th Canadian Conf. on Electrical and Computer Engineering (CCECE).   IEEE, 2017, pp. 1–6.
  • [31] X.-D. Wang and T. Chen, “Performance and area optimization of vlsi systems using genetic algorithms,” VLSI Design, vol. 3, no. 1, pp. 43–51, 1995.
  • [32] S. Benkhider, F. Boumghar, and A. Baba-ali, “A parallel genetic approach to the gate sizing problem of vlsi integrated circuits,” in Proc. of the 12th Int. Conf. on Microelectronics.   IEEE, 2000, pp. 169–173.
  • [33] R. Wang, Z. Zhou, H. Ishibuchi, T. Liao, and T. Zhang, “Localized weighted sum method for many-objective optimization,” IEEE Trans. on Evolutionary Computation, vol. 22, no. 1, pp. 3–18, 2016.
  • [34] A. B. Kahng, S. Kang, H. Lee, I. L. Markov, and P. Thapar, “High-performance gate sizing with a signoff timer,” in 2013 IEEE/ACM Int. Conf. on Computer-Aided Design (ICCAD).   IEEE, 2013, pp. 450–457.
  • [35] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: Nsga-ii,” IEEE Trans. on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.
  • [36] H. R. Maier, S. Razavi, Z. Kapelan, L. S. Matott, J. Kasprzyk, and B. A. Tolson, “Introductory overview: Optimization using evolutionary algorithms and other metaheuristics,” Environmental Modelling & Software, vol. 114, pp. 195–213, 2019.
  • [37] C. C. Coello, “Evolutionary multi-objective optimization: a historical view of the field,” IEEE Computational Intelligence Magazine, vol. 1, no. 1, pp. 28–36, 2006.
  • [38] F. Brglez and H. Fujiwara, “A Neutral Netlist of 10 Combinational Benchmark Circuits and a Target Translator in Fortran,” in Proc. IEEE Int. Symp. Circuits and Systems (ISCAS 85).   IEEE Press, Piscataway, N.J., 1985, pp. 677–692.
  • [39] “Genus synthesis solution,” https://www.cadence.com, accessed: 2020.
  • [40] “Innovus implementation system,” https://www.cadence.com, accessed: 2020.