I Introduction
0.99 LowPower circuits and systems have become increasingly popular due to their wideranging applications ranging from implantable devices to spacial electronics and mobile devices [1, 2, 3, 4, 5, 6]. One of the most widelyused and effective techniques to reduce power consumption is to scale down the power supply voltage [7, 6]. However, conventional designs may fail to operate successfully at low supply voltages, therefore it is necessary to develop new design paradigms. For an SRAM, which consumes a large portion of the energy budget, several designs at the circuit and higher levels have been proposed [8, 9, 10, 11, 12, 13]. In addition, to alleviate shortchannel effects (SCE), emerging FinFET and GAA (gateallaround) device structures and analysis models have been proposed for low voltage operations [14, 15].
For applications in which leakage power reduction is the main priority, minimizing the Data Retention Voltage (DRV) is an effective technique to reduce the total leakage power of the standard SRAM. In [16], authors optimized the DRV by choosing suitable values for widths of transistors in the SRAM cell. Another method to decrease the total leakage power is to utilize transistors with different threshold voltages (). For example, using a combination of transistors with different threshold voltages and oxide thicknesses () is shown to reduce leakage power consumption by up to 40%[17]. In such techniques, the only optimization goal is the reduction of the leakage power, whereas for many applications other important metrics such as delay, active power, and area should also be considered.
In this paper, we present a hybrid SRAM cell assignment method that optimizes the product of power, area, and delay (PAD) by assigning multisized and dual SRAM cells in the SRAM array. We will discuss and compare six different assignments which have different sizing (normally sized, upsized I [version 1], and upsized II [version 2]) with high and low threshold voltages across the array. For normally sized cases, we follow an approach similar to the one discussed in [16] to optimize the 6T SRAM cell to achieve the lowest possible DRV subject to a certain noise margin. In an improvement over [16], we calculate an optimal point for DRV by changing both width and length of transistors in the 6T SRAM cell. We also follow a similar design strategy to find the minimum supply voltages for read and write operations. Since these optimization methods will produce different SRAM cell sizing solutions, we compare them using realistic cache design scenarios to select the best design for the SRAM array based on the PAD cost function.
Ii Hybrid SRAM Cell Assignment
SRAM cells which are placed farther from the wordline drivers have larger delays as the wordline signal takes longer to reach those cells [17]. This is depicted in Fig. 1. For an SRAM design with a large number of cells in a row, the contribution of wordline wire delay in the overall delay will be significant [4, 18]. For example, for an SRAM with 256 cells per row, operating at 500mV supply voltage in a 32nm technology node [19], the wordline delay is about 35% of total cell read delay^{1}^{1}1delay of wordline and intrinsic delay of an SRAM cell.. Since the delay of worst case cell will determine the overall delay of the SRAM, we need to reduce this worst case delay. In the following, we present our multisized and multi SRAM cell assignment technique to find the optimal point for the PAD product of the SRAM array.
Iia MultiSized Cell Assignment
Upsizing helps to reduce the read/write operation delays for an SRAM cell. However considering its area and power overheads, we utilize a cost function based on PAD product to account for those overheads. The main task is to find the number of cells that should be sized up to achieve the optimal PAD value for the SRAM array.
Consider a row in the SRAM array with total number of cells. Our design strategy is to upsize transistors of the last SRAM cells in the row to make cell read delay of the cell less than or equal to the read delay of the cell located in the place. Layout of the upsized cells are designed such that their height remains the same as normally sized SRAM cells. For this purpose, width of layout of the upsized cells are increased by . Eqs. 12 express the delay of and cells. In these equations, delay of the wordline wire is also considered, and wordline is modeled as a distributed RC circuit. We formulate the delay difference between the last cell in a row and the cell as in Eq. 3. In our design, we try to minimize the absolute value of , keeping in mind that cannot be positive, because the cell should not be the critical cell with the highest delay. Note that in Eqs. 13, stands for the intrinsic delay of the normally sized SRAM cell, and is the intrinsic delay of an upsized version of the cell. Also, is a technologydependent constant. and are the width and the height of layout of a regular sized 6T SRAM cell, respectively.
(1) 
(2) 
(3) 
The goal of our design is to find the best number of cells that would be upsized (), and their upsizing factor . To take the three important factors of SRAMs into account, we used multiplication of power, delay, and area as our optimization cost function Eq. 4, which is expanded as a function of the SRAM array parameters in Eq. LABEL:eqy2. Fig. 2 illustrates an example PAD function, as a function of for different values. Note that the portion of curves corresponding to negative values for are also shown to depict the overall trend of PAD versus .
(4) 
(5) 
(6) 
By having , , , , and at , the PAD cost function will be as shown in Fig. 3. As seen in this figure, if we are allowed to use three sizing versions for the SRAM cells (including the normally sized), the improvement on the cost function is much larger. For this case, the improvement is 14% over the conventional onesized cell assignment, and for the case of using only two sizing versions, the improvement is 7%.
IiB Multi Cell Assignment
In this subsection, we extend the procedure in the previous subsection to multi cell assigment using a predictive BulkCMOS 32nm LowPower (LP) and 32nm HighPerformance (HP) technologies [19]. It is wellknown that each additional threshold voltage needs one more mask layer in the fabrication process, which increases the cost and reduces the yield [20]. Therefore, it is common to limit the multi cell libraries to dual, i.e., high and low transistors. Dual assignment is a wellknown optimization technique, e.g., a dual, dual solution was proposed in [17] to reduce the overall leakage power consumption. However, in this paper, we incorporate dual assignment to our multisizing algorithm, discussed in the previous section, to minimize the PAD product. For the simplicity of the fabrication process, we assume all the transistors in an SRAM cell are chosen to have the same threshold voltage, i.e., either low or high.
IiC MultiSized Dual Cell Assignment
Considering both multisized and dual assignments, we develop a hybrid cell assignment in the SRAM array. The following six different cell assignments are considered. Fig. 4 shows four of these cases.
1. All cells are high and normally sized.
2. All cells are high, among which  cells are normally sized, and the rest ( cells) are upsized I [version 1].
3. All cells are high, among which  cells are normally sized, cells are upsized I [version 1], and the rest of the cells are upsized II [version 2].
4. All cells are normally sized, among which  cells are high, and the rest of them are low.
5.  cells are high and they are normally sized. The rest of the cells are low and are upsized I [version 1].
6.  cells are high and they are normally sized, cells are low and are upsized I [version 1], and the rest of them are low and are upsized II [version 2].
Each design of different configuration is represented by a triplet (,,) where the first entry, p, corresponds to the first  cells in the SRAM array; the second entry, q, corresponds to the next cells, and the third entry, r, corresponds to the last cells. Each entry is either zero, one, two or three, if the corresponding cells are not used, are normally sized, are upsized I (upsized by amount), and are upsized II (upsized by amount), respectively. The subscript corresponds to low or high by having a letter or , respectively. For example, (,0,0) corresponds to the original configuration where all cells are normallysized and with high, and (,,) corresponds to a configuration with  first cells with nominal sizing and high, upsized I for the next cells with high, and the last cells with upsized II and low. It is clear that a configuration with (0,0,0) does not exist.
Algorithm 1 shows the pseudocode of our hybrid cell assignment approach. After initializing some parameters in line 1, intrinsic delay of a normally sized SRAM cell with high is extracted in lines 23. The best cell assignment for a row in the SRAM array is then found in lines 410. More specifically, in the loop shown in lines 67, the best threshold voltages together with the best sizing for to cells are found. At the end of this loop, number of right most cells () that should be upsized and their threshold voltages are found. In the next iteration, this loop is run on the + cell to the cell, and the length of the row is set to . The procedure repeats times, and finds the final cell assignment for the entire row. This cell assignment will be used for other rows as well.
Tables III show the set of configurations along with their improvements on the overall PAD cost function compared with the conventional cell assignment. As seen in Table I, having =, if we are allowed to use three sizing versions for the SRAM cells in an array with = regular sized cells , = upsized I with high cells and = upsized II cells with low cells, the improvement on the cost function is much higher. For this case, the improvement is 34% over the conventional onesize high assignment, and for the case of using only two sizing versions for the SRAM cells, the improvement is 16%. By considering a 10% variation in the threshold voltage and sizes of cells (modeling the process variation), the 34% improvement for (,,) cell assignment will be decreased to 10%.
Cell Assignment  Cell Counts  Cost Reduction(%) 

(, 0 , 0 )  (256, 0, 0)   
(, , 0 )  (121, 135, 0)  7 
(, , )  (70, 74, 112)  14 
(, , 0 )  (124, 132, 0)  4 
(, , 0 )  (119, 137, 0)  16 
(, , )  (68, 70, 118)  34 
Cell Assignment  Cell Counts  Cost Reduction(%) 

(, 0 , 0 )  (256, 0, 0)   
(, , 0 )  (228, 28, 0)  6 
(, , )  (189, 65, 2)  12 
(, , 0 )  (150, 106, 0)  27 
(, , 0 )  (145, 111, 0)  28 
(, , )  (13, 147, 96)  40 
Please note that in the case of driving SRAM cells from two sides of the SRAM array, we can use the above optimization/design procedure for = to find the best cell assignment for the first half of the array. The other half will be the mirror of the first one. Note that in this case, some portions (i.e., the cells in the middle) of the SRAM array will end up with higher size and/or lower threshold voltages.
Our hybrid SRAM cell assignment algorithm is applicable to various devices and technologies including standard BulkCMOS, FinFET, and FDSOI. However, the optimal cell assignment depends on the device type and technology node. More precisely, the number of upsized cells or cells with higher threshold voltage values may be different for FinFETs and FDSOIs, and different in 32nm technology when compared to 14nm technology. This also means that the PAD improvement varies from one technology node or device type to another.
IiD Reliable SRAM Cell Design
In [16], authors have formulated the DRV of a 6T SRAM cell based on sizes of transistors and some technology parameters for a 0.13m industrial technology. Using this formula, the DRV value for a predictive 32nm BulkCMOS technology (PTM) [19] is calculated as 11mV, which is smaller than the thermal noise (26mV). Using 26mV as a starting voltage and considering the variation on the threshold voltage, the final DRV after adding a 100mV guard band voltage to account for larger memories will be 194mV. Fig. 5 shows the Hold Static Noise Margin (HSNM) for joint sweeping of NMOS and PMOS transistors’ width and length values. The best design has the SNM value of 59mV, that we shall set as a minimum required SNM for designing other SRAM cells.
By following the similar design methodologies for minimizing the supply voltages for read and write operations, new designs (sizes of transistors) will be achieved. In Section III, we provide the results for these design methodologies.
Iii Simulation Results
Design Method  DRVbased  Readbased  Writebased  

technology  32nm  90nm  32nm  90nm  32nm  90nm 
W M1/M2  4.5  2.0  3.5  3.0  3.5  3.5 
W M3/M4  2.5  2.5  4.0  2.5  2.0  2.0 
W M5/M6  2.5  1.5  3.0  1.5  5.0  3.0 
L M1/M2  2.0  3.0  1.5  3.0  1.5  1.5 
L M3/M4  3.0  3.0  3.5  2.0  4.0  2.0 
L M5/M6  1.0  2.0  3.0  4.0  1.0  2.0 
We designed and optimized 6T SRAM cell for three different approaches (DRVbased, readbased, and writebased sizing). Table III shows the final values for sizes of transistors for each of these methods. We also considered the conventional cell sizing method. To compare these four cell sizing methods in a real setup, we designed four 32kb SRAMs (each with a single block), in each of them the base cell is chosen from one of the four mentioned cell sizing methods. We applied our best hybrid cell assignment technique to all of these four memories. PAD product cost function was used to compare different designs. Fig. 6(a) shows PAD for running couple of benchmarks with small idle times (hot caches), and Fig. 6(b) depicts the results for benchmarks with large idle times (cold caches). All benchmarks are from SPEC CPU2000 [21]. We used CACTI for extracting these results. Also, Hspice 2016 and Matlab 2016 are used for SRAM characterizations. As seen, writebased method with hybrid cell assignment works better for hot caches and shows about 41% improvements on the cost function over the conventional sizing for applu benchmark. For cold caches, the DRVbased method is better which shows 32% improvement over the conventional sizing in sixtrack benchmark. Thus, our recommendation is to use writebased method for hot caches such as L1 instruction cache, and DRVbased for cold caches such as L2 cache, and apply our hybrid cell assignment to all of these caches.
Iv Conclusions
In this paper, we proposed a hybrid cell assignment for SRAM, which is based on using multisized dual transistors in the SRAM array. In addition, a DRVbased optimization design method for cell sizing is presented. In this method, width and length of transistors in the 6T SRAM cell are optimized to achieve the smallest DRV subject to a minimum noise margin. Using this method for optimizing read and write operations, two new designs are obtained. Simulation results for SPEC CPU2000 benchmarks confirmed significant reduction in PAD cost function for both hot caches such as L1 and cold caches such as L2 caches.
References
 [1] B. Ebrahimi, M. Rostami, A. AfzaliKusha, and M. Pedram, “Statistical design optimization of finfet sram using backgate voltage,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 10, pp. 1911–1916, Oct 2011.
 [2] M. Darwich, A. Abdelgawadf, and M. Bayoumi, “A survey on the power and robustness of finfet sram,” in 2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS), Oct 2016, pp. 1–4.
 [3] M. Imani, M. Jafari, B. Ebrahimi, and T. S. Rosing, “Ultralow power finfet based sram cell employing sharing current concept,” Microelectronics Reliability, Available online, vol. 10, 2015.
 [4] A. Shafaei and M. Pedram, “Energyefficient cache memories using a dualvt 4t sram cell with readassist techniques,” in 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2016, pp. 457–462.
 [5] S. Ahmad, M. K. Gupta, N. Alam, and M. Hasan, “Low leakage single bitline 9T (SB9T) static random access memory,” Microelectronics Journal, vol. 62, pp. 1 – 11, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0026269216300945
 [6] S. Gupta, A. Raychowdhury, and K. Roy, “Digital computation in subthreshold region for ultralowpower operation: A device–circuit–architecture codesign perspective,” Proceedings of the IEEE, vol. 98, no. 2, pp. 160–190, 2010.
 [7] H. Jiao and V. Kursun, “Power gated sram circuits with data retention capability and high immunity to noise: A comparison for reliability in low leakage sleep mode,” in SoC Design Conference (ISOCC), 2010 International, Nov 2010, pp. 5–8.
 [8] G. Pasandi and S. M. Fakhraie, “A 256kb 9T nearthreshold SRAM with 1k cells per bitline and enhanced write and read operations,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 11, pp. 2438–2446, Nov 2015.
 [9] K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii, and H. Kobatake, “A readstaticnoisemarginfree SRAM cell for lowvdd and highspeed applications,” IEEE Journal of SolidState Circuits, vol. 41, no. 1, pp. 113–121, 2006.
 [10] Y.W. Chiu, Y.H. Hu, M.H. Tu, J.K. Zhao, Y.H. Chu, S.J. Jou, and C.T. Chuang, “40 nm bitinterleaving 12T subthreshold SRAM with dataaware writeassist,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 61, no. 9, pp. 2578–2585, Sept 2014.
 [11] N. Gong, S. Jiang, A. Challapalli, S. Fernandes, and R. Sridhar, “Ultralow voltage splitdataaware embedded SRAM for mobile video applications,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 59, no. 12, pp. 883–887, Dec 2012.
 [12] I. J. Chang, D. Mohapatra, and K. Roy, “A prioritybased 6T/8T hybrid SRAM architecture for aggressive voltage scaling in video applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 2, pp. 101–112, 2011.
 [13] G. Pasandi and S. M. Fakhraie, “An 8T lowvoltage and lowleakage halfselection disturbfree SRAM using BulkCMOS and FinFETs,” IEEE Transactions on Electron Devices, vol. 61, no. 7, pp. 2357–2363, July 2014.
 [14] T. Cui, J. Li, A. Shafaei, S. Nazarian, and M. Pedram, “An efficient timing analysis model for 6t finfet sram using currentbased method.” in ISQED, 2016, pp. 263–268.
 [15] L. Wang, A. Shafaei, S. Chen, Y. Wang, S. Nazarian, and M. Pedram, “10nm gatelength junctionless gateallaround (jlgaa) fets based 8t sram design under process variation using a crosslayer simulation,” in SOI3DSubthreshold Microelectronics Technology Unified Conference (S3S), 2015 IEEE. IEEE, 2015, pp. 1–2.
 [16] H. Qin, Y. Cao, D. Markovic, A. Vladimirescu, and J. Rabaey, “Sram leakage suppression by minimizing standby supply voltage,” in Quality Electronic Design, 2004. Proceedings. 5th International Symposium on. IEEE, 2004, pp. 55–60.
 [17] B. Amelifard, F. Fallah, and M. Pedram, “Leakage minimization of sram cells in a dual and dual technology,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 7, pp. 851–860, July 2008.
 [18] A. Shafaei, H. AfzaliKusha, and M. Pedram, “Minimizing the energydelay product of sram arrays using a devicecircuitarchitecture cooptimization framework,” in Proceedings of the 53rd Annual Design Automation Conference. ACM, 2016, p. 107.
 [19] A. S. University. (2013) Predictive technology model (ptm). [Online]. Available: http://ptm.asu.edu/

[20]
S. Mukhopadhyay, H. Mahmoodi, and K. Roy, “Modeling of failure probability and statistical design of sram array for yield enhancement in nanoscaled cmos,”
IEEE transactions on computeraided design of integrated circuits and systems, vol. 24, no. 12, pp. 1859–1880, 2005.  [21] J. F. Cantin and M. D. Hill. (2003) SPEC CPU2000 benchmarks. [Online]. Available: http://research.cs.wisc.edu/multifacet/misc/spec2000cachedata/
Comments
There are no comments yet.