A domain specific language (DSL), named MotePy is presented. The DSL offers a high level syntax with low overheads for ML/data processing in time constrained or memory constrained systems. The DSL-to-C compiler has a novel static memory allocator that tracks object lifetimes and reuses the static memory, which we call the compiler-managed heap.READ FULL TEXT VIEW PDF
Fully-Homomorphic Encryption (FHE) offers powerful capabilities by enabl...
Specialized hardware accelerators are becoming important for more and mo...
In this paper, we present a framework to generate compilers for embedded...
This abstract presents a serverless approach to seismic imaging in the c...
The never-ending demand for high performance and energy efficiency is pu...
Remote attestation is an emerging technology for establishing trust in a...
Domain-specific languages (DSLs) are both pervasive and powerful, but re...
When writing code for machine learning or data processing applications we want the convenience of a high level syntax of a language like Python . Typically this convenience comes with hidden overheads like interpreted execution (e.g., in CPython ) or dynamic memory allocation and garbage collection. In many situations these overheads do not pose a concern for the application developer.
However, in certain situations when we have tight time constraints  or memory constraints , we cannot compromise on the need to have very low overheads. Examples include machine learning pipelines in an OS kernel  or in small embedded devices .
The proposed DSL, MotePy
offers a high level Python-like syntax for specifying modular data processing pipelines and performing vector/matrix operations.
Machine learning and data processing applications typically have a pipeline-like execution structure. The main module of a MotePy application is a pipeline specification of the form given in the example below (Figure 1). The pipeline specification is a Python-like list of operations. In the example, the first operation is acquire function defined in the module datasource and the second operation is predict function defined in model module. MotePy’s runtime executes the pipeline in a loop, starting from the first operation, in sequence till the last operation. Once the last operation is complete the pipeline is re-executed from the beginning, in a loop fashion.
Though the syntax is close to Python, the language differs from Python in its implementation in significant ways:
MotePy is statically typed. Its type annotation syntax is based on Python’s optional type annotation syntax, but uses MotePy’s own notation similar to that of Java for array typing.The reason for own notation for array typing is that type annotations are not common in Python and Java’s array notation is familiar to most programmers.
MotePy supports only static memory allocation. Only primitive types are allocated on the stack. Rest are allocated in the static data area. However, unlike other languages such as C or C++ the MotePy compiler tracks lifetimes of array objects and reuses memory. We call this the compiler-managed heap.
Every MotePy module has an init function and a special flow function. The flow function is the operation invoked as part of the pipeline execution and is decorated with @flow syntax. Apart from these two functions a module may define any number of functions to perform its function.
In Figure 2 the data array is randomly initialized and then passed onto the next stage in the pipeline by invoking a special next function. The next function enables us to specify at what point in the execution the next stage in pipeline must be executed. If a flow function returns without invoking the next function the pipeline is re-executed from the beginning. For instance, if the data acquisition is not complete, then the acquisition stage may return without calling next. The MotePy runtime will automatically re-execute the pipeline.
shows a neural-network-like processing stage. The network has two layers of weights that are randomly initialized in the example. In an actual system the weights will be loaded from a file. The code shows high level syntax for vector/matrix operations, which are translated by the MotePy compiler into low-level loop operations.
The code also shows invoking the C function printf. MotePy allows us to invoke C/C++ functions from MotePy code without any need for a special foreign function calling interface. The MotePy array/matrix layout is same as that of C/C++ enabling us to pass arrays/matrices back and forth between C/C++ code without any transformation overheads.
The MotePy compiler translates the high level DSL code to low level C/C++ code, which is then compiled using a platform C/C++ compiler such as GCC to get the executable.
Figure 4 shows a two-stage MotePy pipeline. Stage 1 initializes vec1, does some computations on it and then invokes next for Stage 2 processing. Stage 2 initializes vec2, does some computations on it, then initializes vec3 and does some computations on it.
Compiling and running the program gives the following output:
Address of vec1: 94473766309984
Address of vec2: 94473766309984
Address of vec3: 94473766309984
The output shows that the addresses of vec1, vec2, and vec3 are same111The generated executable has an optional parameter for specifying the number of pipeline iterations. In this case a value of ‘1’ was passed. If none is passed, the pipeline is executed in an infinite loop.. The reason is that the compiler allocates static memory for the vectors and reuses the memory based on the variable lifetimes. It may be noted that owing to Address Space Layout Randomization the printed addresses will be different during each program execution.
In the case of this example, the life time of vec1 is in the program region from line numbers 8-11 in stage1.py. Life time of vec2 is from line numbers 8 - 11 in stage2.py and that of vec3 is from line numbers 12 - 15 in stage2.py. Since the lifetimes are non-overlapping the compiler intelligently reuses the static memory area for the three vectors.
Figure 5 shows a modified version of the Stage 1 of the pipeline. Here the vec1 is initialized in the init function. The process function computes a new value of vec1 based on the current value. Compiling and running the pipeline with this modified Stage 1 gives the following result:
Address of vec1: 94035385520224
Address of vec2: 94035385520464
Address of vec3: 94035385520464
The output shows that the compiler allocates a separate non-overlapping memory area for vec1 and uses the same memory area for vec2 and vec3. The compiler’s allocator now knows that vec1’s lifetime overlaps with that of vec2 and vec3 and hence allocates separate memory for vec1.
MotePy’s static allocator allocates along the 2-dimensional space with address as one axis and variable lifetime as the other axis. In comparison, the static allocators of languages like C/C++ simply allocates along the address line.
The objective of the MotePy allocator is to minimize the allocated space subject to the constraint that no two blocks overlap in this 2-dimensional space. The current implementation uses a greedy approach, which may yield sub-optimal results. More work on the allocator is planned in the future.
I would like to thank Microsoft Research, India for the grant support. I am especially thankful to Dr. Sriram Rajamani, Satish Sangameswaran, and Dr. Harsha Vardhan Simhadri for the support during the course of this work. I thank Nandagopal, Sandesh Ghanta, and Surya Chaitanya who worked with me as student interns during the initial development of this work.
Lee, Edward A. ”Cyber-physical systems-are computing foundations adequate.” Position paper for NSF workshop on cyber-physical systems: research motivation, techniques and roadmap. Vol. 2. Citeseer, 2006.