DeepAI AI Chat
Log In Sign Up

Issues with rounding in the GCC implementation of the ISO 18037:2008 standard fixed-point arithmetic

by   Mantas Mikaitis, et al.
The University of Manchester

We describe various issues caused by the lack of round-to-nearest mode in the gcc compiler implementation of the fixed-point arithmetic data types and operations. We demonstrate that round-to-nearest is not performed in the conversion of constants, conversion from one numerical type to a less precise type and results of multiplications. Furthermore, we show that mixed-precision operations in fixed-point arithmetic lose precision on arguments, even before carrying out arithmetic operations. The ISO 18037:2008 standard was created to standardize C language extensions, including fixed-point arithmetic, for embedded systems. Embedded systems are usually based on ARM processors, of which approximately 100 billion have been manufactured by now. Therefore, the observations about numerical issues that we discuss in this paper can be rather dangerous and are important to address, given the wide ranging type of applications that these embedded systems are running.


page 1

page 2

page 3


Stochastic Rounding: Algorithms and Hardware Accelerator

Algorithms and a hardware accelerator for performing stochastic rounding...

Formal verification of a controller implementation in fixed-point arithmetic

For the implementations of controllers on digital processors, certain li...

Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ODEs

Although double-precision floating-point arithmetic currently dominates ...

lrsarith: a small fixed/hybrid arithmetic C library

We describe lrsarith which is a small fixed precision and hybrid arithme...

Deploying Customized Data Representation and Approximate Computing in Machine Learning Applications

Major advancements in building general-purpose and customized hardware h...

A systematic approach to computing and indexing the fixed points of an iterated exponential

This paper describes a systematic method of numerically computing and in...