Hardware uniform random number generators exist in both research and commercial computer architectures, with generation rates of up to 6.4 Gb/s [IntelHardware]
. Uniform random numbers are widely used in applications such as cryptography, where the objective is to generate bit vectors (e.g., 256-bit vectors) that are uniformly distributed over some range and are therefore difficult to guess. In contrast, this article focuses onnon-uniform random number generators.
I-a Applications of non-uniform random variates
Many important applications in science and engineering depend not on uniform random samples, but instead require non-uniform random variates (random samples chosen from a non-uniform probability distribution), from a wide range of distributions. Examples of these applications of non-uniform random variates range from Monte Carlo simulations [metropolis1949monte], to quantitative finance [chen2017monte] to particle filter localization for driverless cars [thrun2010toward]
. Non-uniform random variates are also important in Bayesian machine learning applications[lambert2018student], which involve the computation of a marginal probability which goes into the denominator of the expression of Bayes’s rule. Computing these marginal probabilities in turn requires evaluating an integral of a probability distribution. Because the distributions in question are typically high-dimensional and have no known analytic equational form, their integration often requires the use of Monte Carlo integration methods where one samples repeatedly from the corresponding distribution.
Because generating samples from distributions whose inverse cumulative distribution function (CDF) does not have a closed form requires the use of time-consuming rejection sampling[devroye2008non], generating random samples from non-uniform distributions is typically an order of magnitude slower and less energy-efficient than generating uniformly distributed random samples [thomas2009comparison]. One promising direction for efficiently generating non-uniform random variates is to sample from a physical process whose evolution in time [zhang2018architecting] or noise characteristics [meech2020efficient] follow some known and (ideally) controllable probability distribution.
This article presents the first demonstration of generating non-uniform random variates by exploiting properties of GFETs previously considered to be undesirable: their ambipolar transfer characteristics and their lack of a band-gap. We provide a tutorial overview of the properties of GFETs (Section II) and introduce a circuit topology for using a chain of GFETs together with a uniform random variate generator to generate dynamically-controllable non-uniform distributions (Section III). We present the methodology we used for fabricating an array of GFETs and empirically characterizing their transfer characteristics (Section IV) and we use those empirically-measured GFET transfer characteristics to demonstrate the proposed method in a simulated combined circuit topology (Section V). We then use the generated non-uniform random variates in an end-to-end system example, where we evaluate their benefit to speeding up Monte Carlo integration, as well as their benefit to reducing the error in the Monte Carlo integration process (Section VI). We propose this system as a component/unit in a more general computing system.
Ii Properties of Graphene Field-Effect Transistors
GFETs have a channel made of single- or multi-layer graphene, rather than a semiconducting material such as silicon or germanium [Schwierz2010]. Unlike traditional semiconducting materials, graphene is a semi-metal: it lacks a band gap and its conduction and valence bands don’t overlap. Instead, the conduction and valence bands meet at a single point, known as the Dirac point (Figure 1).
Electrons at the Dirac point are effectively massless and so have unusually high electron mobilities. As a result, the phonon-limited carrier mobility (the highest possible mobility limited by interactions between carriers and vibrations of the channel’s crystal lattice) of graphene on SiO is predicted to be as high as 200,000 [Chen2008]. Although high electron mobilities result in more efficent flow of charge, the lack of a band gap means GFETs have low on- to off-current ratios and can never completely turn off, making them unsuitable for digital logic applications [keyes1985makes].
The poor on- to off-current ratios of GFETs in digital logic applications does not preclude their use in other areas of computing. Because it is possible to tune the Fermi level in graphene (which typically lies at the Dirac point) by biasing the channel, it is possible to control device characteristics, e.g., using multi-gate structures, in a manner not equally possible in typical metal-oxide-semiconductor field-effect transistors (MOSFETs). GFETs also have unique transfer characteristics: as the gate voltage is swept, the drain current exhibits a v-shaped characteristic curve, with the conductance increasing until it reaches a minimum value before increasing again.
Iii A GFET Non-Uniform Random Variate Generator
If the signal at the gate of a GFET is a uniform random voltage distribution, then the distribution of the drain current will be modified by the GFET’s transfer characteristics. The exact shape of the transfer characteristics varies with the source-drain voltage . Thus, for a uniform random voltage distribution at the gate, varying for a GFET changes the distribution of drain currents. If these drain currents are converted to a voltage and passed through additional GFETs, it is possible to combine the transfer characteristics and biasing of multiple GFETs to achieve a range of drain current (and hence voltage) distributions.
Figure 2 shows a possible circuit to implement generation of a controllable non-uniform voltage distribution using GFET properties. Each GFET in Figure 2 has a bias voltage, , that controls its transfer characteristics. The first GFET has a uniformly-distributed random voltage across its gate and a corresponding distribution of drain currents, with the values of the drain currents for each input voltage determined by the transfer characteristics of the GFET at its bias voltage . The circuit in Figure 2 converts the drain current of the first GFET into a voltage input to the gate of the second GFET, using a resistor, . In practice, using a transimpedance amplifier (TIA) to perform this current-to-voltage conversion would result in less Johnson-Nyquist noise in the generated voltages, though the presence of such noise may not be detrimental given our goal of generating random variates. The analyses that follow in Section V therefore use a resistor for converting the drain currents to voltages to control the second stage in the circuit. The second GFET in Figure 2, operating at a bias voltage of , further shapes the distribution of the output signal. By selectively connecting multiple GFETs in the manner of Figure 2 (and possibly using multi-gate GFETs), this method in principle permits generation of a final output with a range of selectable distributions, controlled by the combination of , , and .
Iii-a Integration with an Existing Computer Architecture
For integration into a larger system, we propose the use of a programmable analog switching matrix, e.g. a MAX11300 [MAX11300Datasheet], such as that illustrated in (Figure 3). This is used as an interface between an external microcontroller or processor and the GFET die and allows for dynamic reconfiguration of the GFET circuit. We present a hardware prototype of this analog swtiching matrix in Section IV.
The GFET random number generator could be integrated into a package with an existing CPU and ADC using bond wires to connect the two separate dies. Figure 4 shows a block diagram of the arrangement. The maximum frequency, , and energy cost, , for a signal traveling across the bond wires are b/s and J/b respectively using the values in Table I. The maximum frequency at which a bit on the bond wire could change state is calculated using the capacitance between the bond wires, , and the resistance across a bond wire, . Hughes et al. [hughes2012hughes] show that is given by:
The energy cost of changing the value of a bit on the bond wire is calculated using the logic high voltage . is then given by [hughes2012hughes]:
The capacitance used in Equations 1 and 2 is calculated using as the permittivity of free space, as the length of the bond wire, , as the radius of the bond wire, and as the distance between them. Grigsby [grigsby2016electric] shows that is given by:
The resistance used in Equation 1 is calculated using as the resistivity of a bond wire and as the cross-sectional area of a bond wire. Grigsby [grigsby2016electric] shows that is given by:
These constraints are negligible, the overall speed is limited by the ADC sample rate and the overall power consumption is determined by the ADC and GFET random number generator.
Iii-B Wavelet Decomposition and Reconstruction of Distributions
In signal processing, the Fourier transform decomposes a time-varying signal into its constituent frequency components and the Fourier series allows for the construction of an arbitrary signal from a sum of sines and cosines. This is a special case of wavelet analysis, which allows for any function to be described by a set of orthonormal basis functions.
In wavelet analysis, an analysing wavelet is used with a scaling function to generate a set of basis functions. These basis functions are simply scaled and shifted versions of the analysing wavelet [Graps1995] and the inner product of the scaling and wavelet functions, which are neccessarily orthogonal, gives a matrix of wavelet coefficients.
The discrete wavelet transform (DWT) uses known scaling and wavelet functions to generate a known set of basis functions. When applied to a discrete signal, those basis functions give an approximation of the signal and the signal is characterised by it’s wavelet coefficients [Daubechies1992]. The inverse transform is simply the linear combination (i.e. the sum) of these basis functions, and thus allows for reconstruction of the original signal with a desired level of accuracy dependent on the number of coefficients used.
In the DWT, the set of coefficients can be considered as a transformation matrix or filter which is applied to a data vector. The coefficients are ordered using two patterns: the first acts as a smoothing filter and the second brings out details in the data. A pyramidal algorithm is used to apply the matrix, with coefficients arranged such that odd rows containg coefficients acting as a smoothing filter and even rows contain those acting as those which bring out the data’s detail. Each pair of rows can be thought of as a level of analysis; as the number of levels increases, the total number of inner products is divided by two[Graps1995]. Thus, each step smooths the data and so information is lost.
We demonstrate wavelet decomposition and reconstruction of a distribution in the following example. We show the distributions reconstructed from inverse DWTs with different numbers of coefficients, corresponding to a given level of accuracy for the lognormal distribution in Figure 5. We used a second-order Coiflet wavelet for both the DWT and the inverse-DWT. We chose this wavelet arbitrarily as a proof-of-concept for the proposed method, but it appeared to give reasonable results. We calculated the Kullback-Leibler (KL) divergence [kullback1951], a measure of the closeness of two distributions, by calculating the peak positions of the generated distribution and the reconstructed distributions and comparing them. Figure 5(b) uses the most coefficients and was thus the most accuracte reconstruction, with a calculated KL divergence of 0, an identical reconstruction. Figure 5(c) used less than half the coefficients of (b) and had a KL divergence of 1.19. Figure 5(d), which used less than a quarter of the coefficients of (b) was actually closer than (c), with a KL divergence of 0.45.
Several approaches to implementing wavelet transforms have been developed: Stephane Mallat proposed a Fast Wavelet Transform algorithm [Mallat1989] and DWT algorithms have been implemented using FPGAs [Bahoura2012]. The output of such an approach would form part of the digital input in Figure 3. Figure 6 is a block diagram of a proposed complete system. Section IV and Figure 7 (c) present an early prototype of this system.
We have shown that a distribution can be reconstructed from a set of wavelet coefficients determined by a wavelet transform. If we consider these coefficients to be bias voltages for GFETs, which have a tuneable characteristic transfer function with a certain distribution, then the characteristic of a GFET can be considered as a mother wavelet, with the bias voltages being the scaling parameters. Summing the output of each GFET (or combination of GFETs), with each representing a basis function, in principle, allows for the reconstruction of any arbitrary function, with the accuracy dependent on the number of GFETs used.
Iv GFET Fabrication and Electrical Characterization
GFETs in the results presented here consist of an Si substrate onto which we patterned a gold back gate. We grow a 60 nm alumina (AlO) layer by atomic layer deposition (ALD), onto which we transfer a monolayer of graphene using a wet transfer process. This graphene monolayer was grown by chemical vapor deposition (CVD) and purchased from Graphenea. After patterning the graphene, we deposit the gold source and drain contacts onto the graphene to create a GFET channel that lies exactly above the back gate. The channel is insulated by another layer of alumina, onto which we pattern gold top gates aligned with the GFET channels. We electrically passivate the whole device by a final ALD deposition of alumina. Finally, to facilitate electrical access to the GFETs, we etch away the alumina on top of the contacts that are electrically linked to the source, drain, back, and top gates of the GFETs.
We fabricate four identical GFETs on each silicon die (Figure 7(a)) and we electrically connect the GFETs via wire-bonding the die from its gold contacts (Figure 7(b)) to a custom printed circuit board (PCB) (Figure 7(c)). The PCB comprises an array of DACs, ADCs, and analog switches, all of which allow for dynamic and in-situ (re)configuration of a given circuit. Because graphene is sensitive to atmospheric effects111In principle, atmospheric effects can lead to doping of the channel. These effects should however not occur even in the absence of the sealed glass protective cover, as we fabricated the devices in a cleanroom environment and encased the graphene in alumina as described above. We however cannot rule out inadvertent doping as a result of the fabrication process., and also to protect the wire-bonding, we place a glass protective cover over each die once bonded to the PCB, sealed with hot glue. The hardware prototype in Figure 7(c) implements the GFETs required to realise the circuit in Figure 2, as well as the analog switching matrix described in Figure 3 and Figure 3.
We performed electrical characterization of the GFETs using two Keithley 2450 source-measure units (SMUs) synchronized using TSP-link. Figure 8(a) shows the transfer characteristic characterization results of a fabricated GFETs, with data obtained by conducting a linear sweep of the (top) gate-source () voltage between 10.0 V and 10.0 V in both forward and reverse directions and measuring the resultant drain current (). Each curve shows the transfer characteristic for a constant source-drain bias voltage (), which we updated for each measurement.
The Dirac points in Figure 8(a) lie to the left of 0.0 V, which suggests an n-type doping of the graphene channel. The deepening of the valley with increasing bias voltage is a commonly-observed characteristic in GFETs [Kedzierski2008], as is the hysteresis in the transfer characteristics, which the measurements of Figure 8 (a) show for all applied bias voltages. This hysteresis is a result of multiple phenomena: charge trapping between the graphene channel and the insulating layers are a major cause [Lemme2010], however, additional factors include capacitive gating causing a negative shift and charge transfer causing a positive shift [Wang2010] in the conductance with respect to gate voltage.
We also measured the versus characteristics of the GFETs while varying the gate voltage between 0.2 V and 1.0 V, for source-drain voltages over the range 0.0 V to 10.0 V. Traditional MOSFETs exhibit saturation of their versus characteristics, with the characteristics separated into two main operating regions: linear and nonlinear. In GFETs however, this saturation does not appear, due to a combination of graphene’s lack of a bandgap and Klein tunneling [Meric2008]. Figure 8(b) shows the measured characteristic curves for the GFET and Figure 8(a) shows the transfer characteristics.
The characterization data in Figure 8(b) indicate that the devices switch from a relatively linear region of conductance to a nonlinear region at a bias voltage of around 3.5 V. This is in line with previous results which show that GFETs, in comparison to MOSFETs, often have a second linear region; there is a point of inflection [Schwierz2010] which appears to be the case in the plots here, at a V of approximately 3.5 V.
As we show in Section V, the nonlinearity of the GFET transfer characteristics, combined with the tunability of the characteristic shape by controlling allows us to use one or more GFETs to trans form uniform distrivutions of into non-uniform distributions of .
V Simulations of GFET Circuits
, using a custom-built simulation model of the circuit, implemented in Mathematica. We use an interpolating function to model the measured device characteristics of each GFET and stimulate the gate of the first GFET in the circuit using a uniform random distribution between8 and 8, effectively a voltage in the range 8 V to 8 V. Figure 9 shows the time series of the uniform random voltages and their corresponding histogram distribution. We pass the output of the model of the first GFET, which is its drain current, through a modeled resistor which converts the drain current into a voltage. We then feed this voltage to the model of the second GFET of Figure 2, which we again model by encapsulating our experimental measurement data in another interpolating function. We apply the drain current of the second GFET to another resistor to obtain the output voltage.
We used the characteristics of the GFET in Figure 8 for simulations here. For the first GFET, we used the characteristic for a 0.8 V bias and for the second GFET, we used the characteristic for a 1 V bias. We use a resistance of 2.2k for the first resistor and of 1k for the output resistance. Figure 10 (a) shows the distribution of currents from the first GFET, and Figure 10 (b) shows the distribution for the second GFET. Figure 11 (a) shows final output distribution (i.e., the output of the first GFET passed through the second GFET).
We investigated the possibility of combining the GFET-based distribution with additional subsequent software transformation by running a second simulation of the circuit in Figure 2 using the experimentally-measured GFET data, with the characteristic for a 1 V biased GFET as the first GFET and the characteristic for a 0.8 V biased GFET as the second GFET. We chose a resistance of 1.2k for the first resistor and 1k
for the output resistance. The initial output distribution was skewed to the right, and so the complementary cumulative distribution function,, given by:
where F(x) is the output distribution of the circuit. To compare the similarity of the generated distribution to a genuine lognormal, we calculated the mean (
) and standard deviation () of the logarithm of the simulation output and used these as parameters to generate a lognormal distribution over the same input space. Figure 11 (b) shows histograms of the simulation output and the reference distribution.
Vi End-to-End Example: Monte Carlo Integration
In Monte Carlo simulations it is common to need to integrate various un-normalized non-uniform density functions to convert them to valid probability density functions[chen2017monte]. Monte Carlo integration is a convenient way of doing this. The normalization could require the integration of a lognormal distribution where is an unknown normalizing constant, is the mean and is the standard deviation:
Let be the error of a Monte Carlo integration and be the time taken by the integration. Let be the number of random samples used in the integration and be the distribution that we sample from. Let be the area of each rectangle used in the integration and and be the corresponding rectangle base and height. Algorithm 1 shows the integration scheme that we used.
We repeated the integration with as: 1) a hardware generated lognormal distribution and 2) a lognormal distribution generated with the C++ standard library’s utility for generating lognormal variates, with the same and as . We also performed the integration with as a uniform distribution generated with the C++ standard library’s random number generator, with various ranges. We assume that samples from the hardware lognormal generator can be generated in the time required for one memory access. We ran all simulations on an 2.8 GHz Intel Core i7 CPU using OpenMP parallelization to utilize all eight processor threads.
Figure 12(a) shows that it is on average faster to use a C++ lognormal random number generator than a C++ uniform random number generator. Running the program assuming that the lognormal samples are generated by the hardware random number generator in the same amount of time required for a memory access is up to faster and always at least faster than using the C++ lognormal random number generator. Figure 12(b) shows that the error reduction for the Monte Carlo integration using the [0, 3] C++ uniform random number generator plateaus at around samples but for the hardware lognormal the error continues to decrease. The lognormal and [0, 3] C++ uniform lines intersect at between and samples and the intersection point shifts to the right as we increase the range of the C++ uniform random number generator.
samples using the C++ uniform and lognormal random number generators. The error bars show a 90 % confidence interval on the mean of 1000 samples for each point.
Vi-B Insights from Monte Carlo Integration
(b) shows that increasing the range of the C++ uniform random number generator decreases the minimum error that the integration plateaus at. Unfortunately increasing the range of the C++ uniform random number generator also increases the number of samples required for the estimate of the area to approach the true value. The proportion of the uniform probability density function overlapping the lognormal probability density function decreases as the range of the uniform distribution is increased. For a given function we cannot know beforehand which range of uniform random numbers will produce a sufficiently small bound on the error of integration. We can avoid this problem by sampling from the exact lognormal density that we want to integrate. When performing the Monte Carlo integration of any non-uniform distribution we should sample from the probability density function of that distribution with the same parameters to minimize the error and number of samples required to get a reasonable estimation. This is not possible when sampling from a bounded uniform distribution.
Vii Related Research
A hardware random number generator, integrated in a CPU capable of producing samples from arbitrary distributions does not currently exist. Table II shows the state-of-the-art of hardware non-uniform random number generators. The prior work on non-uniform random number generation in Table II is fundamentally different to prior work on uniform random number generation. The publications in Table II characterize the non-uniform distribution of the physical process used to obtain the random samples. The prior work on uniform random number generators does not produce or refer to a non-uniform distribution of random numbers [wallace2016toward, lee2019true, perach2019asynchronous, srinivasan20102]. No comparison can be made between the GFET and uniform random number generators. The uniform random number generation efforts excluded from Table II produce single bit samples where the result is either 0 or 1. The non-uniform random number generation efforts included in Table II produce multiple (usually 6 or greater) bit samples with a given non-uniform distribution. Prior work that is capable of producing samples from arbitrary non-uniform distributions exists [nguyen2018programmable]. Their method is not well suited for integration with current CPU architectures as it requires large optical components, it is therefore unclear how it could be miniaturized and integrated into a CPU [nguyen2018programmable]. In contrast Section III describes how the GFET random number generator would interface with an existing CPU.
Currently no architecture exists with a Gb/s generation rate for arbitrary non-uniform distributions. As the method we present is analog, it should be possible to use existing technology to drive and sample from it. This will allow us to produce samples from arbitrary distributions at Gb/s sample rates. Popular statistical tests such as Dieharder are designed for samples from a [0, 1] uniform distribution and are therefore not compatible with the non-uniformly distributed random numbers produced in this work [Dieharder].
|Photon Detection||1.77 Gb/s||Exponential||2017||[marangon2017source]|
|Photo Diode||17.4 Gb/s||Husumi||2018||[avesani2018source]|
|Photon Detection||66.0 Mb/s||Arbitrary||2018||[nguyen2018programmable]|
|Photon Detection||200 Mb/s||Normal||2018||[raffaelli2018homodyne]|
|Photon Detection||320 Mb/s||Exponential||2018||[tomasi2018model]|
|Electronic Noise||6.40 Gb/s||Normal||2019||[hu2019gaussian]|
|Photon Detection||6.80 Mb/s||Exponential||2019||[park2019practical]|
|Photon Detection||63.5 Mb/s||Exponential||2019||[lin2019true]|
|Photon Detection||8.25 Mb/s||Normal||2019||[guo2019parallel]|
|Photon Detection||1.00 Mb/s||Exponential||2019||[xu2019spad]|
|Electronic Noise||13.8 kb/s||Normal||2020||[meech2020efficient]|
Viii Summary and Insights
This article demonstrates a novel circuit-level approach to generating samples from non-uniform probability distributions, exploiting the transfer characteristics and ambipolarity of graphene field-effect transistors (GFETs).
We describe the fabrication of arrays of GFETs on a silicon substrate and wire bonding of the fabricated devices to a custom PCB. We experimentally characterize the GFET transfer and output characteristics at a range of GFET bias voltage configurations. Using the obtained characterization data, we simulate possible circuit designs for non-uniform random number generators comprising circuits requiring just two transistors and a resistor (or transimpedance amplifier). The results demonstrate that a circuit comprising a chain of two GFETs transforms a uniformly-distributed random input voltage into a non-uniformly distributed output. In the first demonstration, biasing the first GFET at 0.8 V, outputting current through a 2.2k resistor and inputting the resultant voltage to the gate of the second GFET, biased at 1 V, produces an output voltage distribution through a 1k output resistor resembling a Burr-type XII distribution. Varying the GFET bias voltages and the resistances permits the circuit to generate other dynamically-chosen distributions. We generated an approximation of a lognormal distribution by biasing the first GFET in the simulation to 1 V and the second to 0.8 V, setting the first resistor to 1.2k and keeping a 1k output resistance, and then taking the complementary cumulative distribution function.
We evaluate the end-to-end use of the GFET-circuit-generated distributions in an application performing Monte Carlo integration. The results show that, using the GFET-circuit-generated non-uniform distributions instead of uniform random samples for sampling locations in the lognormal distribution improves the speed of Monte Carlo integration by a factor of up to 2 . This speedup is based on the assumption that the analog-to-digital converters that will be necessary to read outputs from GFET-based random number generation circuit can produce samples in the same amount of time that it takes to perform memory accesses.
This research is supported by an Alan Turing Institute award TU/B/000096 under EPSRC grant EP/N510129/1, by EPSRC grant EP/R022534/1, and by EPSRC grant EP/V004654/1. N.J. Tye acknowledges funding from EPSRC grant EP/L016087/1. J.T. Meech acknowledges funding from EPSRC grant EP/L015889/1.