Neural Network-Inspired Analog-to-Digital Conversion to Achieve Super-Resolution with Low-Precision RRAM Devices

11/28/2019 ∙ by Weidong Cao, et al. ∙ 0

Recent works propose neural network- (NN-) inspired analog-to-digital converters (NNADCs) and demonstrate their great potentials in many emerging applications. These NNADCs often rely on resistive random-access memory (RRAM) devices to realize the NN operations and require high-precision RRAM cells (6 12-bit) to achieve a moderate quantization resolution (4 8-bit). Such optimistic assumption of RRAM resolution, however, is not supported by fabrication data of RRAM arrays in large-scale production process. In this paper, we propose an NN-inspired super-resolution ADC based on low-precision RRAM devices by taking the advantage of a co-design methodology that combines a pipelined hardware architecture with a custom NN training framework. Results obtained from SPICE simulations demonstrate that our method leads to robust design of a 14-bit super-resolution ADC using 3-bit RRAM devices with improved power and speed performance and competitive figure-of-merits (FoMs). In addition to the linear uniform quantization, the proposed ADC can also support configurable high-resolution nonlinear quantization with high conversion speed and low conversion energy, enabling future intelligent analog-to-information interfaces for near-sensor analytics and processing.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Many emerging applications have posed new challenges for the design of conventional analog-to-digital (AD) converters (ADCs) [1, 2, 3, 4]. For example, multi-sensor systems desire programmable nonlinear AD quantization to maximize the extraction of useful features from the raw analog signal, instead of directly performing uniform quantization by conventional ADCs [3, 4]. This can alleviate the computational burden and reduce the power consumption of backend digital processing, which is the dominant bottleneck in intelligent multi-sensor systems. However, such flexible and configurable quantization schemes are not readily supported by conventional ADCs with dedicated circuitry that has fixed conversion references and thresholds.

To overcome this inherent limitation of conventional ADCs, several recent works [5, 6, 7] have introduced neural network-inspired ADCs (NNADCs) as a novel approach to designing intelligent and flexible AD interfaces. For instance, a learnable 8-bit NNADC [7] is presented to approximate multiple quantization schemes where the NN weight parameters are trained off-line and can be configured by programming the same hardware substrate. Another example is a 4-bit neuromorphic ADC [6] proposed for general-purpose data conversion using on-line training by leveraging the input amplitude statistics and application sensitivity. These NNADCs are often built on resistive random-access memory (RRAM) crossbar array to realize the basic NN operations, and can be trained to approximate the specific quantizationconversion functions required by different systems. However, a major challenge for designing such NNADCs is the limited conductanceresistance resolution of RRAM devices. Although these NNADCs optimistically assume that each RRAM cell can be precisely programmed with 612-bit resolution, measured data from realistic fabrication process suggest the actual RRAM resolution tends to be much lower (24-bit) [8, 9]. Therefore, there exists a gap between the reality and the assumption of RRAM precision, yet lacks a design methodology to build super-resolution NNADCs from low-precision RRAM devices.

In this paper, we bridge this gap by introducing an NN-inspired design methodology that constructs super-resolution ADCs with low-precision RRAM devices. Taking advantage of a co-design methodology that combines a pipelined hardware architecture with deep learning-based custom training framework, our method is able to achieve an NN-inspired ADC whose resolution far exceeds the precision of the underlying RRAM devices. The key idea of a pipelined architecture is that many consecutive low-resolution (1

3-bit) quantization stages can be cascaded in a chain structure to obtain higher resolution. Since each stage now only needs to resolve 13-bit, we can accurately train and instantiate it with low-precision RRAM devices to approximate the ideal quantization functions and residue functions. Key innovations and contributions in this paper are as follow:

  • We propose a co-design methodology leveraging pipelined hardware architecture and custom training framework to achieve super-resolution analog-to-digital conversion that far exceeds the limited precision of the RRAM device.

  • We systematically evaluate the impacts of NN size and RRAM precision on the accuracy of NN-inspired sub-ADC and residue block and perform design space exploration to search for optimal pipelined stage configuration with balanced trade-off between speed, area, and power consumption.

  • SPICE simulation results demonstrate that our proposed method is able to generate robust design of a 14-bit super-resolution NNADC using 3-bit RRAM devices. Comparisons with both the state-of-the-art ADCs and other NNADC designs reveal improved performance and competitive figure-of-merits (FoMs).

  • Our proposed ADC can also support configurable nonlinear quantization with high-resolution, high conversion speed, and low conversion energy.

Ii Preliminaries

Ii-a RRAM Device, Crossbar Array and NN

Ii-A1 RRAM device

A RRAM device is a passive two-terminal element with variable resistance and possesses many special advantages, such as small cell size (, –the minimum feature size), excellent scalability (10), faster readwrite time (10) and better endurance (10 cycles) than Flash devices [2, 10].

Ii-A2 RRAM crossbar array

RRAM devices can be organized into various ultra-dense crossbar array architectures. Fig. 1(a) shows a passive crossbar array composed of two sub-arrays to realize bipolar weights without the use of power-hungry operational-amplifiers (op-amps) [7]

. The relationship between the input voltage “vector” (

) and output voltage “vector” () can be expressed as . Here, () and () are the indices of input ports and output ports of the crossbar array. The weight can be represented by the subtraction of two conductances in upper () sub-array and lower () sub-array as

(1)

Therefore, the RRAM crossbar array is capable of performing analog vector-matrix multiplication (VMM) and the parameters of the matrix rely on the RRAM resistance states.

Ii-A3 Artificial NN

With the RRAM crossbar array, an NN shown in Fig. 1(b) can be implemented on such hardware substrate. Generally, the NN processes the data by executing the following operations layer-wise [17]:

(2)

Here, is the weight matrix to connect the layer and layer .

is a nonlinear activation function (NAF). These basic NN operations, e.g., VMM and NAF, can be mapped to the RRAM crossbar array and CMOS inverters shown in Fig. 

1(a), where the voltage transfer characteristic (VTC) is used as an NAF [7].

Ii-B NN-Inspired ADCs

ADC can be viewed as a special case of classification problems which maps a continuous analog signal to a multi-bit digital code. An NN can be trained to learn this input-output relationship, and a hardware implementation of this NN can be instantiated in the analog and mixed-signal domain. This is the basic idea behind NNADCs which implements the learned NN on a hardware substrate to approximate the desired quantization functions for data conversion:

(3)

where, is the resolution; is input analog signal and is the output digital codes; and are the minimum and maximum values of the scalar input signal . Since RRAM crossbar array provides a promising hardware substrate to build NNs, recent work has demonstrated several NNADCs based on RRAM devices [5, 6, 7]. Although the NN architectures adopted by these NNADCs are various, they all rely on a training process to learn the appropriate NN weights to approximate flexible quantization schemes that can be configured by programming the weights stored in RRAM conductanceresistance. However, existing NNADCs [5, 6, 7] often exhibit modest conversion resolution (48-bit) and invariably rely on optimistic assumption of the RRAM precision (612-bit), which is not well substantiated by measurement data from realistic RRAM fabrication process [8, 9]. This resolution limitation severely constrains the application of NNADCs in emerging multi-sensor systems that require high-resolution (10-bit) A

D interfaces for feature extraction and near-sensor processing 

[1, 3, 4].

Fig. 1: (a) Hardware substrate to perform basic NN operations, where the passive crossbar array with two sub-arrays executes VMM and the VTC of CMOS inverter acts as NAF. (b) An example of NN.

Ii-C Pipelined ADCs

Pipelined architecture is a well-established ADC topology to achieve high sampling rate and high resolution with low-resolution quantization stages [11]. Fig. 2(a) illustrates a typical pipelined ADC with stages whose resolution RESO can be achieved by concatenating -bit of each stage with digital combiner: . Note that is usually and not necessarily identical in all stages. As the Fig. 2(a) illustrates, an arbitrary stage- contains two sub-blocks: a sub-ADC and a residue. The sub-ADC resolves -bit binary codes from input residue , while the residue part amplifies the subtraction between the input residue and the analog output of sub-ADC by to generate the output residue for next stage. This process can be expressed as a simple function:

(4)

Here, is the analog output of sub-DAC that depends on . For example, assuming and , then and ; and Fig. 2(b) shows the corresponding residue function. To understand the basic working principle of pipelined ADCs, we use a 4-bit pipelined ADC with four 1-bit stages in Fig. 2(c) as an example. Assuming the initial analog input is (), then the first stage will output “1”—a digital code, and “”— an analog residue according to Eq. (4) which will be processed by the following stage in the same way as initial analog input. Finally, we can obtain 4-bit outputs , which is the quantization of (). This example also shows that a higher resolution (4-bit) can indeed be constructed with low-precision (1-bit) stages in a pipelined ADC.

Fig. 2: (a) General architecture of pipelined ADC. (b) An example of residue function when . (c) A quantization example of a 4-bit pipelined ADC with four 1-bit stages.

Iii Co-Design Methodology

Fig. 3: Proposed co-design framework for the super-resolution NNADC. (a) Pipelined architecture for the proposed NNADC. (b) Off-line training model of each stage-. Proposed training framework takes ground truth datasets as inputs during off-line training to find the optimal weights and derive the RRAM resistances to minimize cost function and best approximate ideal quantization function and residue function.

Iii-a Hardware Substrate

Iii-A1 Pipelined architecture

The observation from traditional pipelined ADCs motivates us to extend such architecture to NNADC to enhance its resolution beyond the limit of RRAM precision. The overall hardware architecture for the proposed high-resolution NNADC is presented in Fig. 3(a), where a pipelined architecture composed of cascaded conversion stages is adopted in the design. This pipelined architecture brings two direct benefits. First, each stage in the proposed NNADC now only needs to resolve -bit quantization, which is well within the precision limit of current RRAM fabrication process [8, 9] and can be easily achieved with the automated design methodology introduced in previous work [7]. Second, although many cascading stages are needed, there only exist three distinct low-resolution configurations to choose from for each stage, namely . This allows us to simplify the design process by focusing on optimizing the sub-block design of each stage with different resolutions. The full pipelined system can then be assembled by iterating through different combinations of the sub-blocks with different resolutions.

Iii-A2 Low-resolution NNADC stage

For stage- in the proposed NNADC, we use a five-layer NN to implement the sub-ADC and the residue block. The five-layer NN can be decomposed into two three-layer sub-blocks, and each of them can be mapped into the corresponding sub-ADC and residue in Fig. 2(a). The cornerstone of this mapping methodology is the universal approximation theorem that a feed-forward three-layer NN with a single hidden layer can approximate arbitrary complex functions [13]. We use the RRAM crossbar array and CMOS inverter illustrated in Fig. 1(a) as the hardware substrate to design the sub-blocks of each stage. As Fig. 3

(b) shows, for the sub-ADC, the input analog signal represents the single “place holder” neuron in MLP’s input layer. Therefore, the weight matrix dimensions are

between the hidden and the input layer, and between the hidden and the output layer, assuming there are and neurons in the hidden and output layer. Here, we use a redundant “smooth” encoding method to replace the standard -bit binary encoding with bits () according to previous work [7], as it improves the training accuracy and reduces hidden layer size of the sub-ADC. For example, we use smooth codes to train a 2-bit sub-ADC with 3-bit smooth codes as output in Fig. 4(b). For the residue, there are () input neurons (one analog input and -bit smooth digital codes from the proceeding sub-ADC block), and only one analog output neuron; therefore, the weight matrix dimensions are between the hidden and the input layer and between the hidden and the output layer, assuming there are hidden neurons. The samplinghold (SH) circuits [18] are used in the output layer to drive the next stage. Since the op-amps in Fig. 2(a) are eliminated in the NN-inspired design of residue circuit, considerable power saving can be obtained from each stage.

Iii-B Training Framework

Iii-B1 Training overview

We propose a training framework that accurately captures the circuit-level behavior of the hardware substrate in its mathematical model and is able to learn the robust NNs and its associated hardware design parameters (i.e., RRAM conductance) to approximate the sub-ADC and residue for each stage. The training framework incorporates two important features. First, we employ collaborative training for the two sub-blocks in each stage. The sub-ADC is initially trained to approximate the ideal quantization function with high-fidelity, then its digital outputs and original analog input are directly fed to the residue block for the residue training. This collaborative training flow can effectively minimize the discrepancy between the circuit artifacts and the ideal conversion at each stage. Second, non-idealities of devices, such as process, voltage and temperature (PVT) variations of the CMOS device and limited precision of the RRAM devices, can be incorporated into training to make the proposed NNADC robust to these defects [14]. This is another advantage of the proposed NNADC over traditional ADC designs, where even with delicate calibration techniques, the non-idealities cannot be fully mitigated [11].

Iii-B2 Training steps

The detailed training flow is shown in Fig. 3(b), which consists of four steps. We focus on describing the training steps for the residue block, as we adopt similar sub-ADC training method that has been elaborated in previous work [7, 14].

Step \⃝raisebox{-0.9pt}{1}: establish learning objective. For the residue circuit, its output is an analog value; therefore, the hardware substrate can be modeled as a three-layer NN with a “place-holder” output neuron:

(5)

Here, . indicates the digital output of the ADC (“1” means , and “0” means GND), and is the scalar residue input of stage-; denote the outputs of the first crossbar layer, which are modeled as a linear function of and , with learnable parameters corresponding to RRAM crossbar array conductances. Each of these voltages is passed through an inverter (shown in Fig. 1(a)), whose input-output relationship is modeled by the nonlinear function , to yield the vector . The linear function models the second layer of the crossbar to produce the output residue for next stage, with learnable parameters . The learning objective is to find optimal values for the parameters such that for all values of in the input range, the circuit yields corresponding residue that are equal or close to the desired “ground truth” in Eq. (4). To achieve this aim, we define a cost function to measure the discrepancy between predicted and true based on the mean-square loss:

(6)

Step \⃝raisebox{-0.9pt}{2}: model hardware constraints. Hardware constraints come from three aspects: CMOS neuron PVT variations, limited precision of RRAM device, and passive crossbar array. To reflect these hardware constraints, we first group all VTCs obtained by Monte Carlo simulations in using the technology specification in Section IV-A. Meanwhile, we control the precision of weight with -bit during the training. Finally, we let the summation of all elements (absolute value) in each column (“0”) of be 1:

(7)

to reflect the weights constraints in Eq. (1).

Step \⃝raisebox{-0.9pt}{3}: hardware-oriented training. We initialize the parameters randomly, and update them iteratively based on gradients computed on mini-batches of pairs randomly sampled from the input range. To incorporate the hardware constraints in step \⃝raisebox{-0.9pt}{2} into training, we let each neuron in Eq. (5) randomly pick up a VTC from during training:

(8)

We then periodically clip all values of between , as well as between to satisfy Eq. (7).

Step \⃝raisebox{-0.9pt}{4}: instantiate conductance values. We adopt the same instantiation method based on previous work [7], which is proven to always find a set of equivalent conductances from the trained weights and biases to map to the RRAM devices in the hardware substrate. After this, we perturb each resistance R by:

(9)

to evaluate the robustness of the NN model to the stochastic variation of RRAM resistance [2].

Fig. 4: Illustrations of trained sub-ADC and residue functions for a pipeline stage with different resolution. (a) 1-bit stage (). (b) 2-bit stage ().

Iii-C Examples of Trained Sub-ADC and Residue

Fig. 4 illustrates the SPICE simulation of different trained stages with the proposed training framework. The sub-ADC and the residue in Fig. 4(a) are trained through a NN and a NN respectively by setting , while the sub-ADC and the residue in Fig. 4(b) are trained through a NN and a NN by setting . In both figures, we use 3-bit RRAM and set in Eq. (9) for evaluation. The comparison between the trained function and the ideal function shows that each stage with low-precision RRAM can accurately approximate the ideal stage function with the aid of the proposed training framework.

Iv Experimental Results

Iv-a Experimental Methodology

Iv-A1 Training configuration

We set

to get three distinct resolution configurations in each pipeline stage in our experiments. For each stage, we train different NN models and each NN model is trained via stochastic gradient descent with the Adam optimizer using TensorFlow 

[15]. The weight precision during training is set to be 17-bit. The batch size is 4096, and the projection step is performed every 256 iterations. We train for a total of 2 iterations for each sub-ADC model and residue model, varying the learning rate from to across the iterations.

Iv-A2 Technology model

We use the HfO-based RRAM device model to simulate the crossbar array [16]. We set the resistance stochastic variation , since it is a moderate variation based on the evaluations from prior work [17]. The transistor model is based on a standard 130 CMOS technology. The inverters, output comparators, and transistor switches in the RRAM crossbars are simulated with the 130nm model using Cadence Spectre. The VTC group is obtained by running 100 times Monte Carlo simulations. The simulation results presented in the following section are all based on SPICE simulation.

Iv-A3 Metric of training accuracy

The trained accuracy of the sub-ADCproposed NNADC is represented by the effective number of bits (ENOB)–a metric to evaluate the effective resolution of an ADC. We report ENOB based on its standard definition ENOB(SNDR-1.76)6.02, where the signal to noise and distortion ratio (SNDR) is measured from the sub-ADC’sproposed NNADC’s output spectrum. The training accuracy of the residue circuit is represented by the mean-square error (MSE) between predicted residue function and ideal residue function. We report the MSE based on 2048 uniform sampling points in the full range of input .

Fig. 5: Sub-block training performance using different NN models and RRAM precision at a fixed stochastic variation . (a) The trend between ENOB and RRAM precision of sub-ADC under different NN models, where the is set as 1, 2, 3 respectively. (b) The trend between MSE and RRAM precision of residue circuit under different NN models, where the is set as 1, 2, 3 respectively.

Iv-B Sub-block Evaluations

Iv-B1 Resolution and robustness

To find a robust design for each stage, we study the relationship between the trained accuracy and RRAM precision of each sub-block with different NN sizes at a fixed stochastic variation. For these experiments, we first incorporate both CMOS PVT variations and limited precision of RRAM device into training, and then instantiate several batches of 100-run Monte Carlo simulations with a resistance variation in Eq. (9), and finally compute the median accuracy of each model.

We plot the trends in Fig. 5. Generally, an -bit RRAM precision is enough to train an NN model to accurately approximate an -bit sub-ADC, which confirms the conclusion in previous work [7]. Particularly, larger size NN models with more hidden neurons can even accurately approximate an -bit sub-ADC with -bit RRAM precision. Similar conclusions can also be made from the trained performance of residue circuits. As the Fig. 5(b) shows, an -bit RRAM precision is enough to train an NN model to accurately approximate a residue circuit. Moreover, a larger size NN with more hidden layer neurons can accurately approximate the residue circuit of -bit stage with -bit RRAM precision.

Iv-B2 Sub-block design trade-off

Each stage- has design trade-off among power consumption , sampling rate and area . A completed design space exploration may involve the searching of different NN sizes of each sub-block in stage-, RRAM precision and stochastic variations. Here, we use three pairs of sub-blocks highlighted by the solid boxes in Fig. 5 as an example to illustrate the design trade-off, since each of them shows enough accuracy and robustness with no more than 4-bit RRAM precision. For these experiments, we combine each pair of sub-blocks to form three distinct sub-blocks with resolution , respectively. We then fix the precision of RRAM device with 3-bit for for all building blocks except for the residue in stage, which use 4-bit RRAM device. We finally study the relationship between the power , speed , and area of each distinct stage- () by simulating the minimum power consumption/area of each distinct stage that works well at different sampling rates.

The trends are plotted in Fig. 6, which shows clear trade-offs between speed and power consumption, as well as speed and area, for each distinct stage. This is because in order to make each sub-block work well under faster speed, we need to increase the driving strength of the neurons by sizing up the inverters, which results in an increase of power consumption and area for each stage.

Iv-B3 Design optimization

Based on the exploration of different sub-block configurations, an optimal design for the proposed ADC with a given resolution can be derived by solving the following optimization problem:

(10)

Here, the first optimal objective () is a standard figure-of-merit that describes the energy consumption of one conversion for an ADC, and the second optimal objective is the area of the proposed ADC. We set as the main optimal objective, since energy efficiency usually is the most important consideration for most applications. In this way, as shown in Fig. 7, we can obtain an optimal design for a maximum 14-bit pipelined NNADC with 12.5 bits of ENOB, and of working at 1. It showcases the advantages of our proposed co-design framework that incorporates many circuit-level non-idealities in the training process, allowing us to realize a robust design cascading up to eleven stages, a level often unattainable with traditional pipelined ADCs.

Fig. 6: Design trade-offs of three distinct stages, with resolution respectively. (a) Power VS speed. (c) Area VS speed.

Iv-C Full Pipelined NNADC Evaluation

We choose the three distinct stages in Section IV-B to evaluate the quantization ability of the proposed full pipelined NNADC. We find that although the co-design framework can help us to train a low-resolution stage to approximate the ideal quantization function and residue function with high-fidelity, the minor discrepancy between the trained stage and ideal stage will propagate and aggregate along the pipeline and finally results in a wrong quantization. Our simulations based on various combinations of different pipeline stages show that a maximum 14-bit pipelined NNADC working at 1 can be achieved by cascading nine 1-bit stages, one 2-bit stage and one 3-bit sub-ADC with 3-bit RRAM precision. Note that the last stage of the 14-bit pipelined NNADC does not need to generate residue. The reconstructed signal of this 14-bit ADC is shown in Fig. 7(a), where the ENOB is 12.5 bits under z sampling frequency. We also report the SNDR trend with input signal frequency in Fig. 7(b). The SNDR begins to degenerate after z input, verifying the sampling frequency ( of input signal frequency) of the proposed 14-bit NNADC is well above 1GHz.

Finally, we train a nonlinear ADC based on the same methodology using a logarithmic encoding on the input signal by replacing in Eq. (3) with () to train a 1-bit stage. We find that a 10-bit logarithmic ADC with 9.1-bit ENOB working at can be achieved by cascading ten such 1-bit stages, and the reconstructed signal is illustrated in Fig. 8.

Fig. 7: (a) Reconstruction of a 14-bit pipelined NNADC with 3-bit RRAM whose pipelined chain consists of eleven stages: nine 1-bit stages, one 2-bit stage and one 3-bit sub-ADC. (b) SNDR trend.
Fig. 8: A 10-bit logarithmic NNADC with ten 1-bit stages.

Iv-D Performance Comparisons

Iv-D1 Comparison with existing NNADCs

We first design an optimal 8-bit NNADC by cascading eight 1-bit stages in Section IV-B and compare it with previous NNADCs [6, 7]. The comparative data are summarized in the left columns of Table I. Compared with them, the proposed 8-bit NNADC can achieve the same resolution and higher energy efficiency with ultra-low precision 3-bit RRAM devices. Both NNADC1 and NNADC2 adopt a typical NN (Hopfield or MLP) architecture to directly train an 8-bit ADC without the optimization of architecture; therefore, they needs high-precision RRAM to achieve the targeted resolution of ADC. NNADC1 uses a large size () three-layer MLP as the circuits model, where parasitic aggregations on the large size crossbar array degenerates the conversion speed. In addition, more hidden neurons are used in NNADC1 which consume more energy. Since each stage in the proposed 8-bit NNADC resolves only 1-bit and has very small size, it can achieve faster conversion speed with higher energy-efficiency, and high-resolution with low-precision RRAM devices. Please note that the reported in NNADC2 is based on sampling a low frequency (44KHz) signal at high frequency (1.66GHz). Therefore, it is considered outside the scope of a Nyquist ADC, and cannot be compared directly with our work on the same basis.

ADC types NNADC Nonlinear ADC Uniform ADC
Work NNADC1[7]* NNADC2[6]* This work* JSSC 09’[11]** ISSCC 18’[3]** This work* JSSC 15’[12]** This work*
Technology () 130 180 130 180 90 130 65 130
Supply () 1.2 1.2 1.5 1.62 1.2 1.5 1.2 1.5
Area () 0.2 0.0050.01 0.02 0.56 1.54 0.03 0.594 0.1
Power () 30 0.10.65 25 2.54 0.0063 31.3 49.7 67.5
() 0.3G 1.66G0.74G 1G 22M 33K 1G 0.25G 1G
Resolution (bits) 8 48 8 8 10 10 12 14
ENOB (bits) 7.96 3.7(NA) 8 5.68 9.5 9.1 10.6 12.5
() 401 8.257.5 97.7 2380 263 57 108.5 11.6
RRAM precision 9 612 3 NA NA 3 NA 3
Reconfigurable ? Yes YesYes Yes No Yes Yes No Yes
  • The results are shown based on simulation.

  • The results are shown on chip.

TABLE I: Performance comparison with different types of ADCs.

Iv-D2 Comparison with traditional nonlinear ADCs

We then compare the trained 10-bit logarithmic ADC with state-of-the-art traditional nonlinear ADCs [11, 3]. The comparative data are summarized in the middle columns of Table I. As it shows, the proposed 10-bit logarithmic ADC has competitive advantages in area, sampling rate, and energy efficiency. JSSC 09’ [11] uses a pipelined architecture to implement an 8-bit logarithmic ADC. Due to the devices mismatch, its ENOB degenerates a bit from the targeted resolution. ISSCC 18’ [3] requires 10-bit capacitive DAC to achieve a configurable 10-bit nonlinear quantization resolution; therefore, it can achieve high ENOB but only works at with significant area overhead. Since we adopt the proposed training framework to directly train a log-encoding signal using small-sized NN models and incorporating device non-idealities, we can achieve a logarithmic ADC with small area, high sampling rate and high ENOB.

Iv-D3 Comparison with traditional uniform ADC

Finally, we compare the trained 14-bit uniform ADC with state-of-the-art traditional uniform ADC. The comparative data are summarized in the right columns of Table I. It shows that the proposed 14-bit NNADC has competitive advantages in sampling rate, ENOB, and energy efficiency. JSSC 15’ [12] uses power hungry op-amps and dedicated calibration techniques, resulting in the power consumption overhead and degeneration of conversion speed. The proposed 14-bit NNADC uses low-resolution stages with very small NN size, enabling faster conversion speed with higher energy efficiency. The slight ENOB degeneration of the proposed ADC is caused by the discrepancy (between the trained stage and ideal stage) propagation along the pipeline stages. Also note that the performance of the proposed NNADCs and the performance of previous NNADCs are based on simulations, while the performance of the traditional nonlinear ADCs and uniform ADC are based on measurements.

V Conclusion

In this paper, we present a co-design methodology that combines a pipelined hardware architecture with a custom NN training framework to achieve high-resolution NN-inspired ADC with low-precision RRAM devices. A systematic design exploration is performed to search the design space of the sub-ADCs and residue blocks to achieve a balanced trade-off between speed, area, and power consumption of each distinct low-resolution stages. Using SPICE simulation, we evaluate our design based on various ADC metrics and perform a comprehensive comparison of our work with different types of state-of-the-art ADCs. The comparison results demonstrate the compelling advantages of the proposed NN-inspired ADC with pipelined architecture in high energy efficiency, high ENOB and fast conversion speed. This work opens a new avenue to enable future intelligent analog-to-information interfaces for near-sensor analytics using NN-inspired design methodology.

Acknowledgement

This work was partially supported by the National Science Foundation (CNS-1657562).

References

  • [1] R. LiKamWa et al., “RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision,” IEEE ISCA, 2016, pp. 255-266.
  • [2] B. Li et al., “RRAM-Based Analog Approximate Computing,” IEEE TCAD, vol. 34, no. 12, pp. 1905-1917, 2015.
  • [3] J. Pena-Ramos et al., “A Fully Configurable Non-Linear Mixed-Signal Interface for Multi-Sensor Analytics,” IEEE JSSC, vol. 53, no. 11, pp. 3140-3149, Nov. 2018.
  • [4]

    M. Buckler et al., “Reconfiguring the Imaging Pipeline for Computer Vision,”

    IEEE ICCV, 2017, pp. 975-984.
  • [5] L. Gao et al., “Digital-to-analog and analog-to-digital conversion with metal oxide memristors for ultra-low power computing,” IEEE/ACM NanoArch, 2013, pp. 19-22.
  • [6] L. Danial et al., “Breaking Through the Speed-Power-Accuracy Tradeoff in ADCs Using a Memristive Neuromorphic Architecture,” IEEE TETCI, vol. 2, no. 5, pp. 396-409, Oct. 2018.
  • [7] W. Cao et al., “NeuADC: Neural Network-Inspired RRAM-Based Synthesizable Analog-to-Digital Conversion with Reconfigurable Quantization Support,” DATE, 2019, pp. 1456-1461.
  • [8] T. F. Wu et al., “14.3 A 43pJ/Cycle Non-Volatile Microcontroller with 4.7s Shutdown/Wake-up Integrating 2.3-bit/Cell Resistive RAM and Resilience Techniques,” IEEE ISSCC, 2019, pp. 226-228.
  • [9]

    Y. Cai et al., “Training low bitwidth convolutional neural network on RRAM,”

    ASP-DAC, 2018, pp. 117-122.
  • [10] H. -. P. Wong et al., “Metal–Oxide RRAM,” in Proceedings of the IEEE, vol. 100, no. 6, pp. 1951-1970, June 2012.
  • [11] J. Lee et al., “A 2.5mW 80 dB DR 36dB SNDR 22 MS/s Logarithmic Pipeline ADC,” IEEE JSSC, vol. 44, no. 10, pp. 2755-2765, Oct. 2009.
  • [12] H. H. Boo et al., “A 12b 250 MS/s Pipelined ADC With Virtual Ground Reference Buffers,” IEEE JSSC, vol. 50, no. 12, pp. 2912-2921, 2015.
  • [13] Kurt Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural Networks, vol. 4, issue. 2, pp. 251-257, 1991.
  • [14] W. Cao et al., “NeuADC: Neural Network-Inspired Synthesizable Analog-to-Digital Conversion,” IEEE TCAD, 2019, Early Access.
  • [15] Kingma et al, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [16] P. Chen and S. Yu, “Compact Modeling of RRAM Devices and Its Applications in 1T1R and 1S1R Array Design,” IEEE TED, vol. 62, no. 12, pp. 4022-4028, Dec. 2015.
  • [17] B. Li, et al., “MErging the Interface: Power, area and accuracy co-optimization for RRAM crossbar-based mixed-signal computing system,” IEEE ACM/EDAA/IEEE DAC, 2015, pp. 1-6.
  • [18] Weidong Cao et al., “A 40Gb/s 39mW 3-tap adaptive closed-loop decision feedback equalizer in 65nm CMOS,” IEEE MWSCAS, 2015, pp. 1-4.