I Introduction
As submicron technology dimensions sharply decrease to a few nanometer ranges in commercialized integrated circuits, the sensitivity of electronic circuits increases drastically [nuclei]. Hence, embedded microprocessors are becoming more vulnerable to soft errors, and designing dependable systems is a challenging task for chip designers. In fact, these systems have to operate reliably even in the presence of faults, to sustain the present growth rate of device count and clock frequency with continuously growing reliability issues. Moreover, the sensitivity of chips is also intensified by voltage scaling [scaling]. Undesirable and accidental faults become more frequent in new generation computing systems where the systems are running heavyduty numerical computations. A singleevent upset (SEU) or multibit upset (MBU) could bring a catastrophic outcome in missioncritical applications such as space and missilenavigation applications. Hence, having reliable arithmetic that is resilient to errors is a primary requirement for missioncritical computing systems.
Posit is a new data type that is capable of storing more informationperbit compared to its IEEE 754 compliant counterparts [john1]. For example, a 32bit posit number can have a similar dynamic range and better accuracy at the same time compared to the 32bit IEEE 754 compliant number. In general, mbit posit has a higher dynamic range and better numerical accuracy properties compared to nbit IEEE 754 compliant number where . It is shown in the literature that for computing systems, bit IEEE 754 compliant numbers can be replaced by bit posit numbers where since the posit number system exhibits a tradeoff between accuracy and dynamic range [Farhad9] [expand1]. These tradeoffs allow the selection of the desired posit format that is suitable for computing systems without compromising accuracy and performance [clarinet1] [saxena1]. Further details of the number system and the formats are discussed in Section IIA. Reliability aspects of posit arithmetic are yet to be explored by the research community.
To the best of our knowledge, this is the first comparative study on the inherent fault tolerance of posit arithmetic visàvis its IEEE counterpart. We carry out an extensive investigation of reliability through an exhaustive fault injection scheme. The major contributions of the paper are as follows:

We propose a theoretical analysis and an exhaustive reliability exploration of posit arithmetic visàvis its IEEE754 compliant counterparts.

We conduct exhaustive reliability exploration as well as machinelearning (ML) benchmarks under fault injection.

We show promising results in posit arithmetic that may encourage its utilization in safetycritical applications, as well as approximate computing.
For the reproducibility, we make our framework open source
[git_fault]. The rest of the paper is organized as follows: In Section II we present a background on posit arithmetic and soft errors followed by related work in Section III. Section IV describes the analysis and the proposed methodology for error resilience using posit arithmetic. In section V, experimental setup and results are discussed. We summarize our work in Section VI.Ii Background and Related Works
Iia IEEE 754 Compliant and Posit Number Systems
The IEEE 7542008 compliant floatingpoint format binary numbers are composed of three parts: a sign, an exponent and a fraction part (see Fig. 1). The sign is the most significant bit indicating whether the number is positive or negative. In a singleprecision format, the following 8 bits represent the exponent of the binary number ranging from to 127. The remaining 23 bits represent the fractional part. The normalized format of floatingpoint numbers is:
(1) 
Posit arithmetic is proposed as a dropin replacement for IEEE 754 compliant arithmetic in 2017 [john1]. The posit number format has several absolute advantages over IEEE 754 compliant arithmetic such as higher accuracy, higher dynamic range, simpler hardware implementation for arithmetic operations, lower area and energy footprints [Gustafson2020]. Besides, it is shown in the literature that mbit posit adders/multipliers can safely replace bit IEEE 754 compliant adders/multipliers where [Farhad9]. Hence, posit representation confirms more informationperbit compared to its IEEE 754 counterpart representation. Furthermore, with posit representation, there are no redundant representations and the overflow/underflow in the computations is nonexistent with posit arithmetic. The subnormal numbers are handled in a normal way with posit representation unlike IEEE 754 representation and there are only two exception cases: zero and notareal (NaR). For all other cases, the value of a posit is given by
(2) 
The regime indicates a scale factor of where and es is the exponent size. The numerical value of is determined by the run length of 0 or 1 bits in the string of regime bits. The use of runlength encoding of the regime automatically allows more fraction bits for the more common values for which magnitudes are closer to 1, and thus provides tapered accuracy in a bitefficient way. Further details about the posit number format and posit arithmetic can be found in [john1]. The posit format and IEEE 7542008 compliant number formats are depicted in Fig. 1.
In our experiments, we have used IEEE 754 compliant 32bit (single precision) floatingpoint numbers and 32bit posit numbers with that are commonly used.
IiB Soft Errors
The sharp technology scaling in new generation integrated circuits accentuates the sensitivity of electronic circuits. As a matter of fact, embedded systems are becoming remarkably sensitive to soft errors. These errors result from a voltage transient event induced by alpha particles from packaging material or neutron particles from cosmic rays [soft_errors]. This event is created due to the collection of charge at a pn junction after a track of electronhole pairs is generated. In past technologies, this issue was considered in a limited range of applications in which the circuits are operating under aggressive environmental conditions like aerospace applications. Nevertheless, shrinking transistor size and reducing supply voltages in new hardware platforms bring soft errors to ground level mainstream applications [mainstream] [mainstream2].
Iii Related Work
Since soft errors became a challenging threat to reliability, numerous published work proposes errorresilient memories. Architecture level error resilience techniques such as single error correction double error detection (SECDED) have been proposed and widely used for memory protection [ref5]. The main drawback of SECDED is its area overhead and the supplementary latency leading to performance loss. A faulttolerant architecture presented in [rsp_ihsen] combines both parity and single redundancy to enhance memories’ reliability. The weakness of these techniques is their area, power and delay overheads due to the additional memory cells and supporting circuits required for error detection and correction.
Circuitlevel techniques have been proposed to overcome architecturelevel overheads. These techniques enhance error resilience at circuit level either by slowing down the response of the circuit to transient events or by increasing its critical charge. Methods such as [ref6] suggest to harden the cell using a pass transistor that is controlled by a refreshing signal. Hardened memory cells were proposed in [ref7], [ref16] and [AS8] that add redundant transistors to the 6TSRAM to increase the cell critical charge. A Schmitt triggerbased technique [ref14] proposes a hardened 13T memory cell. However, this technique slows down memory due to a Schmitt trigger’s hysteresis temporal characteristics. In the context of emerging approximate computing applications, recent works like [smail_prop] proposed a tradeoff between reliability and computing precision. To assess the reliability level at an early stage, fault injection can be performed in simulation. All these techniques do not take into account the actual data representation that is stored within the protected memories, especially numerical values.
A number of researchers have approached the reliability issue in numerical algorithms. The vast majority of them treat an algorithm as a blackbox and track the behavior of these applications when running with injected soft errors. In [fp_err], a study on soft error propagation in floatingpoint programs is presented. In [REF_19]
, the behavior of various Krylov methods is analysed. The authors track the variance in iteration count based on the data structure that experiences the bit flip. Authors in
[REF_21]analyzed the impact of bit flips in a sparse matrixvector multiply (SpMV). Exemplifying the concept of blackbox analysis of bit flips,
[REF_26] presents BIFIT for characterizing applications based on their vulnerability to bit flips.While these techniques study the reliability of applications based on floatingpoint formats, none of them study the inherent sensitivity level of floatingpoint representations. This paper proposes a comparative study of the inherent sensitivity to errors in IEEE754 compliant floatingpoint and posit representations.
Iv Proposed Methodology
We cover analyses for SEU and MBU considering different aspects. Since float and posit have different data formats as shown in Fig. 1, a single or multiple bit flip event in a 32bit number results in a new different number for both formats. In our analyses, we consider bit flips in fraction, and exponent for both formats as well as regime bits for posit numbers. For our theoretical analyses, we use numbers , where is compliant to IEEE 7542008 and is a posit number, and , . , are the positions of the bit flips in and , . For both the number formats, we assume that the SEU and MBU occur at the same position.
Iva Fraction bits
The total number of fractional bits are in IEEE 7542008 and in posit compliant number respectively. bits are appended in the fraction part in a posit compliant numbers to the left of the fraction bits. Let be and , two binary numbers that represent fraction parts of an IEEE 754 compliant number and a posit compliant number respectively. A representative diagram to understand the bit flip phenomena in the fraction part is shown in Fig. 2.
The largest error that can occur in IEEE 7542008 compliant number due to a bit flip in the fraction part is the bit flip of (). A flip from to or viceversa would result in subtraction or addition of in the decimal value of the fraction part. On the other hand, in posit to have the similar impact, has to be at bit position. In general, a bit flip in the fraction part of IEEE 7542008 compliant number is then similar impact in the fraction part of a posit compliant number can be observed if there is a bit flip in the position . The value of depends on the configuration of the posit number. For example, a bit posit number can have k = to regime bits since regime bits are calculated based on runlength of ’0’ or ’1’ from the most significant bits after the sign bit (refer equation 2. In practical scenarios the runlength of ’0’ or ’1’ is not expected to be very large since a large results in a very high dynamic range for the numbers. and configuration results in the dynamic range that is similar to the IEEE 754 compliant number. In general, where is the exponent size in IEEE 754 compliant number and is the posit exponent size. Since, in the most realistic scenarios , the bit flip in the position in IEEE 754 compliant number and posit number would result in smaller error in the posit number.
In case of double bitflip, the second bit flip position being , assuming that the second bit flip occurs in the same locations in an IEEE 754 compliant number and a posit number, the error due to the second bit flip is higher in the IEEE 754 compliant number. The higher error is due the the higher weight associated with the bit position in IEEE 754 compliant number compared to the posit number.
IvB Exponent and regime bits
A single bit flip in the exponent of IEEE 754 compliant numbers and posit numbers injects a higher error impact in the IEEE 754 compliant number due to the phenomena explained in Section IVA is applicable to exponent bits as well. Due to more weight associated with the position in the exponent part of the IEEE 754 compliant number compared to the exponent part of the posit number, the error incurred is higher in the IEEE 754 compliant number. A bit flip in the regime part of posit incurs higher error compared to the bit flip in the bits to of the float section due to higher weight associated with the posit number. Similarly, second bit flip in the exponent incurs lower error in posit compared to an IEEE 754 compliant number while the second bit flip in regime section of posit number results in higher error. In the subsequent section, we present toolflow to validate our claims.
IvC Toolflow
To assess the impact of errors on both posit and IEEE 7542008 compliant representations, we proceed to an exhaustive fault injection exploration process. We modified the posit public implementation [softposit] to support our fault injection mechanism. Besides, we built an exhaustive exploration platform shown in Fig. 3. The idea is to focus on the actual arithmetic representation of the data instead of a coarse grain probabilistic study or a very fine grain circuit simulation.
Fig. 3 explains the followed methodology to assess the inherent reliability of the two tested representations. Since we are considering reliability from a hardware perspective, we are sticking to the actual bitlevel data representation. In this paper, we focus on IEEE 754 single precision floating point and (where is width of the representation and is exponent size) posit representations for our experiments. Hence, from a raw 32bit word, we generate the corresponding floatingpoint and posit numbers. For a fair comparison, the errors are injected exactly in the same respective bit of the two tested representations. For double bit upsets as well, we choose the same locations for bitflips in both representations. The inherent reliability of the two representations is assessed by quantifying the mean relative error distance (MRED) of a corrupted value from a golden (noncorrupted) value as shown in Equation 3.
(3) 
Where and are the golden and the corrupted value respectively when a fault is injected in a bit . MRED gives an insight on the mean impact of bit flips that are injected in all words’ 32 bits exhaustively.
V Experimental Setup and Results
The experiments are divided into two categories:

The first is a comparative applicationagnostic exploration of the fault injection impact on reliability of posit and IEE754 compliant representations.

The second is a comparative reliability study on a set of ML systems on two different applications tested under fault injection.
This section details the experimental setup and discusses the results.
Va Exhaustive comparative reliability exploration
This set of experiments follow the methodology presented in Section IV. The toolflow is implemented in C using the SoftPosit platform [softposit] for posit and a developed bitwise fault injection platform for IEEE 754 compliant numbers. The experiments are run on a 3 GHz Intel Core i7 processor running the OS X 10.9.5 operating system.
VA1 Single Event Upset
The aboveexplained experimental setup aims at exploring the impact of bit flips on a given numerical data representation in an exhaustive manner. The results shown in Fig. 4 expose in a logarithmic scale the comparison between posit and IEEE 754 compliant floatingpoint representations’ inherent resiliency to bit flips. The comparison is performed based on the MRED between the golden value (without fault injection) and the corrupted one in both posit and IEEE 754 floats. The results shown in Fig. 4 represent a geometric superposition where the IEEE 754 compliant floatingpoint graph is in most of the cases above the posit graph. This indicates that the posit representation is globally more error resilient than IEEE 754 compliant representation. In fact, in more than 95% of the explored cases, a bit flip in an IEEE 754 compliant number deviates from the golden data more than the posit number. Moreover, we registered only 31 cases of not a real (NaR) with posit, which represents 0.7E6% of the fault injections. On the other hand, for IEEE compliant floatingpoint, more than 4% of the fault injections resulted in not a number (NaN). These cases correspond to nonrepresentable data in the IEEE 754 compliant floatingpoint graph of Fig. 4.
VA2 Double Event Upset
Starting from 40nm technology, more than 35% of bit upsets are MBUs. Therefore, it is important to consider this phenomenon in reliability assessment processes. In this section, we track the impact of double bit upsets on the data representation for both posit and IEEE 754 compliant floatingpoint representations.
Following the same fault injection exploration mechanism as shown in Fig. 3
, we evaluate the impact of twobit flips on the MRED between posit and IEEEcompliant floating point. We inject two bit flips at every iteration: the first is injected exhaustively bitwise, and the second location is randomly selected following a normal distribution among the remaining bits. Fig.
4 shows the MRED caused by twobit upsets in both representations. Injecting 2 bit flips results globally in higher error magnitudes. However, the results still confirm the higher inherent error resilience of posit shown with the single bit upset experiments.The error resilience in posit is due to two main reasons: the variable size of the scale factor (regime bits) and a larger number of bits in the fractional part for the vast majority of cases. The larger number bits in the fractional part is due to variablesized regime bits. An SEU or MBU in the fractional part results in a lower error compared to an SEU or MBU in the regime bits or exponent bits in posit. Since the IEEE 754compliant representation has more exponent bits compared to the regime bits and the exponent bits in the posits, the resulting error is higher in IEEE 754 compliant floatingpoint numbers compared to posits. Better error resilience of posit datatype makes it the right choice for missioncritical nextgeneration systems. Moreover, the absence of redundant representations such as NaNs is a supplementary factor that enhances posit robustness to errors.
VB Machinelearning applications
Recent attacks on ML applications are based on deliberate faultinjection techniques [neuro]
. In this subsection, we show the results of fault injection experiments applied in a set of ML applications. We evaluate two computervision systems. The first is a biometric authentication system using Electrocardiogram (ECG) signals based on LATIS ECG Database
[latis]. The second is a human action recognition (HAR) system using kinematic accelerometer signals trained with Berkley MHAD dataset [mhad]. For the features extraction phase, two types of features were chosen:

Temporal features such as the mean, the standard deviation, the quadratic mean, and the covariance.

Timefrequency type characteristics resulting from the Wavelet transformation.
We used the sliding window method to extract the characteristics of each window. These characteristics are subsequently concatenated in a descriptor vector.
For the classification phase, we evaluate a set of most widely used classifiers in the literature. These ML techniques are: the Support Vector Machines (SVM) with linear, Gaussian and Cubic kernels, Decision Trees, Discriminant analysis classifiers, Naive Bayes classifiers, KNN classifiers (K = 1), AdaBoost classifier, Random forest and Neural network classifiers. Figures
5, 6, 7 and 8 show the recognition rate of the different techniques and settings with and without fault injection. These figures show the impact of single fault injection in the input of the different classifiers for both IEEE floating point and posit data representation. In all these cases with varying features and classifiers, the fault injection impact is significantly lower on the posit implementation which confirms the findings in Section VA. In fact, while the overall accuracy drop in posit under fault injection varies from and , the accuracy drop in IEEEcompliant floating point varies between and .Vi Conclusion
This paper investigates the reliability of two prominent data representations, namely IEEE 754 compliant single precision and (32,2) posit representation. Firstly, we presented a brief theoretical analysis of both the number formats for a single bit flip and double bit flip. An exhaustive fault injection platform is implemented, and the exploration led to a promising conclusion for posit arithmetic that corroborated to our theoretical analysis. To further illustrate this finding, we conduct a benchmark of several ML techniques under fault injection. The experiments demonstrate higher inherent robustness of posit compared to the classical IEEE 754 representation. These findings are useful for safetycritical systems design. They can also be exploited for limiting imprecision in approximate computing designs. Future work will tackle the implementation of a full positbased processor architecture.
Comments
There are no comments yet.