Userfault Objects: Transparent Programmable Memory

06/24/2021 ∙ by Konrad Siek, et al. ∙ Czech Technical University in Prague 0

The Userfault Object (UFO) framework explores avenues of cooperating with the operating system to use memory in non-traditional ways. We implement a framework that employs the Linux kernel's userfault mechanism to fill the contents of runtime objects on demand. When an object's memory is accessed the framework executes a user-defined function that generates a slice of the object. The back-end can generate data from thin air, calculate it from a formula, or retrieve it from persistent storage, the network, or other sources (with or without post-processing). UFOs follow the memory layout of standard runtime objects, so they can be introspected and written to safely. The framework manages the loading and unloading of object segments to ensure that memory is reclaimed as needed and data is never lost. This allows the UFO framework to implement larger-than-memory data structures that never materialize into memory in full. Implementing objects as UFOs also impacts performance, since overhead of populating memory is amortized by loading entire pages of data at a time. The host runtime can also rely on direct memory accesses into userfault object obviating the need for a special dispatch mechanism. We provide a proof-of-concept implementation of the UFO framework for the R language.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Most objects are straightforward, but some object harbor secrets. While most objects are collections of assorted fields bundled with methods that operate on them, occasionally an object is a transparent façade providing an abstraction over an underlying complex system. Sometimes the façade is pierced and bad things happen. An example follows.

This is a definition of an object in the R language representing a sequence of elements 1–10.

    simple <- as.integer(c(1,2,3,4,5,6,7,8,9,10))

Internally, this object is a simple vector with a header and a body consisting of all of its member values. The values can be accessed directly. For example, the

simple[i] operator retrieves the the ith indexed element by accessing the memory at an offset from the end of the header. However, the same sequence can be expressed as using the following simpler syntax.

    magic <- 1:10

The magic vector outwardly appears to be the same as the simple vector. However, internally the vector only contains two values—the beginning and end of the range—and the values of the sequence are calculated on demand. Then, magic[i] is redefined to run the function calculating the value, instead of accessing an offset.

This alternative representation of vectors in R (ALTREP) (Becker, 2020) allows implementing custom, even user-defined, back-ends to vectors while providing a compatible API to ordinary R vectors. The advantage of this is the flexibility of semantics and internal representation that allows implementing file-backed persistent vectors, larger than memory vectors, and fast sequences with low memory overheads. The disadvantage is that since the internal layout of ALTREP vectors is so different from the layout of R vectors, the entire R runtime needed retooling to handle them.

The abstraction ALTREP vectors present to the user can be pierced by introspection. While many languages provide mechanisms for observing the internals of objects, in R this is perhaps easier than most. R itself and many R packages are written in C, so R provides a C API that allows packages to interface with the runtime internals and runtime objects. This exposes the layout of objects to external programmers, who are known to circumvent prescribed API functions in favor of direct memory accesses into vectors. ALTREP vectors defend themselves against this by materializing if the pointer to the body of a vector is accessed via an API function. On the other hand, the problem persists in general, as a sufficiently stubborn programmer may reach into a vector via pointer arithmetic without reference to the API at all, inadvertently dispelling the ALTREP abstraction and introducing segmentation faults or subtle memory bugs.

There are a number of frameworks and runtime mechanisms providing similar façades without runtime support too. In R there are numerous libraries providing transparent larger-than-memory vectors (matter (Bemis and Vitek, 2017), ff (Kane et al., 2013), bigstatr (Privé et al., 2018), disk frame (ZJ and Poon, 2018), etc.) or abstractions over SQL databases (dbplyr (Wickham and Ruiz, 2018)), in addition to ALTREP. Abstracting frameworks are also found in other languages, e.g. Remote Objects (Oracle, 1997) in Java and Dask data frames (Rocklin, 2015) in Python. These can be introspected into by the application of nefarious means.

We attempt to create completely transparent abstractions by exploring a different approach We introduce a framework for Userfault Objects (UFOs).111https://github.com/PRL-PRG/UFOs  UFOs expose an area of virtual memory to the program in some host language. This area is populated with the representation of the object using the layout and contents that the host language is expecting, but this is done lazily. Specifically, when an access to the memory inside the object occurs, the UFO framework communicates with the operating system (i.e. with the Linux Kernel via userfaultfd) to materialize and populate a section of memory. The population procedure is performed by a custom (user-defined) function which provides a specific slice of the object. The population function can provide contents of the object by calculating it or retrieving it from persistent storage (e.g. by parsing a CSV file or running SQL queries), a remote site, or other external sources. The ability to process data on the fly as it is being read, as well as to have no backing persistent storage at all distinguishes UFOs from memory mapped files.

2. UFO core framework

Figure 1. UFO core framework architecture.

Our proof-of-concept implementation consists of two layers: a language agnostic core framework and a language specific API. This section describes the former. UFO core interacts directly with the operating system and manages the creation and destruction of individual UFOs. It also handles reading and updating them. The framework discharges its responsibilities via two cooperating subsystems: the event API and the page fault loop, each running in a separate operating system thread. The event API is exposed as a façade through which UFOs can be created or freed. The UFO API calls these functions directly. The page fault loop is responsible for managing UFOs as they are accessed. This involves loading and unloading UFOs fragments in and out of memory, in response to the needs of the user application. It provides mechanisms for populating areas of memory, a garbage collector for UFO fragments, and a system for persistently caching modified fragments. The user does not interact with the page fault loop directly. Instead, the page fault loop is registered as a handler for page faults with the Linux kernel for a range of virtual memory addresses. The subsystems of the page fault loop are always reactions to operations performed on memory guarded by the UFO core framework.

2.1. Objects

These userfault objects are user-facing, logical structures representing complete larger-than-memory objects of a host language. Logically, each UFO owns a range of consecutive addresses whose contents are defined by a single, specific, user-defined population function.

While UFO core is agnostic with respect to the layout of host language objects, we apply a simplifying assumption toward their internal representation to facilitate the definition of population functions for fragments of objects. We assume that UFOs represent arrays, each containing a header followed by a body consisting of some number of indexed, uniformly-sized elements. We show the logical layout of a UFO in Fig. 2

. The boundary between the header and the elements is immutable and falls at the boundary between the first and second segment of the UFO. The front of the UFO is padded to accommodate the boundary. The rear is padded to align the UFO with page size. The header is initially empty and its contents are not generated by UFO core. The contents of elements in the body are generated by the population function.

The population function is executed during the lifecycle of the UFO to provide contents of elements as they are accessed. The definitions of population functions are external to UFO core. The function generates the contents for a range of elements for a specific UFO, where the first and last index of the generated elements are specified via function parameters. UFO core may demand that the function populate any contiguous region within a UFO. A generated region may overlap other regions, and regions may be populated in any order as well as re-populated repeatedly. To accomodate this behavior, populate functions must be deterministic and idempotent. Since the state of the host runtime is unknown at the time of any specific memory access, population function must be careful about interacting with the host runtime. Currently population function may not attempt to access other UFOs, since it would lead to nested userfault events. We show an example population function in Fig. 3.

2.2. Segments

Figure 2. UFO layout.
  typedef struct { int from; int to; int by; } ufo_seq_data_t;
  int populate_sequence(uint64_t start_ix, uint64_t end_ix,
                        ufUserData ufo_ud, char* target) {
    ufo_seq_data_t* data = (ufo_seq_data_t*) ufo_ud;
    for (size_t i = 0; i < end_ix - start_ix; i++) {
      ((int *) target)[i] =
          data->from + data->by * (i + start_ix);
    }
    return 0;
  }
  
Figure 3. Population function: from-to-by sequence.

Internally, UFOs are split into segments, each segment representing a manageable chunk of the object’s address range. At any point any segment can be actively held in memory (materialized) or be removed from memory (dematerialized). Dematerializing segments either destroys or caches data, depending on circumstances. Materializing a segment involves (re)generating its data through its population function or retrieving the data from a pre-existing cache. Segment management is entirely transparent to the end user.

UFO core has no way of tracking accesses to segments after they are materialized. Therefore, ensuring that written values are not forgotten at dematerialization requires caching. Dirty segments are detected by comparing the hash of their contents at the time of dematerialization with the hash after their most recent materialization. Hashes are computed using the 256-bit BLAKE3 algorithm (O’Connor et al., 2020). Dematerialization of dirty segments will first cause their contents to be stored in anonymous temporary persistent storage. Each UFO has its own file which remains in existence as long as the UFO is alive. All cache files are cleaned up at program termination by the operating system.

UFO core keeps count of how much memory is being used by materialized segments. UFO core carries two user-defined parameters: the high and low water marks. When the amount of memory taken up by materialized segments exceeds the high water mark, the UFO garbage collector is called and it starts dematerializing segments until the low water mark is reached.

The garbage collector walks the loaded segments queue (implemented as a circular buffer) starting from the longest residing segment. It dematerializes segments one by one until enough space has been freed. Dematerialization does not immediately destroy the area of memory. Instead, the kernel is signaled that the page is no longer in use and can be recycled. The kernel lowers the resident set size immediately and but recycles the memory at its convenience.

3. R Ufo Api

Figure 4. R UFO API architecture diagram.

We implemented language-specific UFO API for the R language. We picked the R language because it is used by data scientists in fields like computational biology, statistics, artificial intelligence, and machine learning, which deal with large volumes of data, often represented as either vectors or data frames (tables with uniformly sized vectors representing columns—à la CSV files). The R ecosystem contains a many larger-than-memory libraries that create object-oriented abstractions to hide the details of memory management from end users while transparently representing the data to the programmer as vectors or data frames

(Kane et al., 2013; ZJ and Poon, 2018; Privé et al., 2018; Bemis and Vitek, 2017; Wickham and Ruiz, 2018).

The R UFO API has two levels of services (Fig. 4). The R UFOs library ties the UFO core framework into the R runtime, providing an API to specific R vector back-ends. It provides a constructor that creates a UFO with a specific user-defined population function. It does so by plugging into R’s custom allocator API (and garbage collector), replacing calls to malloc and free with calls to the R UFO core event API. The UFO allocator returns an area of memory to the R runtime, which is the runtime populates with an appropriate header. In most cases, the R runtime will not pre-fill the vector, but if this happens, UFOs ignore specific writes and their population function generates the appropriate pre-fill values.

We provide four back-end implementations built on top of R UFOs. Binary file-backed vectors (and matrices) read data directly from a binary file. The data is located via seeking. CSV vectors each read a single column of a CSV file. The values are parsed on-the-fly from a fragment of a pre-scanned CSV file. From-to-by sequences lazily generate data from a simple formula, based on the index of an element. Empty vectors are pre-filled with a default value and can be used to store large intermediate results of computation.

The biggest difficulty in implementing R vectors, is that R operations do not allow custom allocation to be used in the results of arithmetic operations and many functions. For this reason, in addition to back-end implementations, R UFO API also provides a reimplementation of R operators that write results to UFOs, as well as a toolkit for chunking the execution of existing functions while aggregating the results into a UFO.

4. Performance

Figure 5. Performance evaluation.

We benchmark UFO performance measured against ALTREP and standard R vectors. ALTREP is a good candidate for comparison because it represents frameworks that create an object-oriented–like facade over complex functionality while appearing as simple vectors. ALTREP is integrated into the R runtime, giving it a performance edge over user-created libraries. We test UFOs in two modes: read/write mode and read-only mode. Read-only mode does not persist changes done to UFOs, which removes the need to calculate hashes of segment contents when loading and unloading them.

We use two identically implemented back-ends for UFOs and ALTREP. File-backed vectors read 4-byte integers from a binary file on disk by seeking to the position of the vector and reading one or more consecutive values. This back-end has a relatively high overhead of retrieving a single value, which can be amortized by populating entire regions at once. Sequence vectors represent from-to-by sequences calculated on the fly (see Fig. 3). Computing an individual element of the sequence is cheap. We measured the time it takes to create a 1GB vector (1K iterations), calculate the sum of its contents (1K iterations), and execute an identity function on each of its elements (10 iterations).

We ran the experiments on a machine with an Intel Core™ i7-10750H CPU @ 2.60GHz12 process, 32GB RAM and a and a Samsung SSD 970 EVO 500GB drive running 64-bit Ubuntu 20.10 with 5.8.0-53-generic Linux kernel. We show the results of the evaluation in Fig. 5. Each plot shows the results for either the creation, sum, or loop microbenchmark. The top row shows results for the file-backed vectors, the bottom one for sequences. The X-axis always shows vector implementations and Y-axis show execution time in nanoseconds. The results are plotted as a violin plots showing the distribution of execution times over multiple iterations.

We observe that UFOs and ALTREP have similar performance for vector creation and the execution time is negligibly small for both frameworks, with some outliers we attribute to initialization and garbage collection. The startup time is higher for R vectors implementing a sequence, because the vector must populated up front, as opposed to UFOs and ALTREP, which calculate these values on demand. This initialization cost for standard vectors could eventually be amortized over multiple passes over the vector.

Sums also yield similar performance for all frameworks. The lightweight calculation overhead involved in sequences especially washes away performance differences. For file-backed vectors UFOs and ALTREP also perform similarly. The R runtime calculates the sum of a vector using a fast arithmetic function. This function cooperates with ALTREP to chunk the vector into regions, which allows ALTREP to amortize the overhead of preparing a file for reading and seeking. While the R runtime does not similarly chunk the execution for UFOs, the UFO framework makes sure to read no less than 1MB of elements at-a-time and cache data, yielding a similar amortization. Thus, the performance for both frameworks is similar. When the hashing mechanism is turned off for read-only UFO vectors, a significant overhead cost is removed for UFOs, yielding a small, but visible improvement in performance.

An importance difference in performance between UFOs and ALTREP stems from the fact that ALTREP performs dynamic dispatch whenever values are accessed, be it a region or a single value. The R runtime attempts to turn individual value accesses into region accesses for ALTREP, but this can only works for specific operations. When the loop benchmark executes, it always executes a function on a single value from a vector, leading to repeated dispatch in ALTREP, and so, deteriorates performance significantly. UFOs also have set-up costs relating to loading data for an accessed value, however these costs are always amortized by loading an entire segment into memory. This gives UFOs an advantage over ALTREP’s dispatch and produces performance close to ordinary vectors when consecutive elements are accessed. However, this approach is costly if the access pattern is spread out, causing the UFO to load and unload a segment for each single value read.

5. Conclusions and future work

The UFO framework explores avenues of cooperating with the operating system to use memory in non-traditional ways. We implement a framework that uses user faults to lazily provide data to a language’s runtime object. This allows the implementation of structures that generate data from a variety of sources, but follow the memory layout of standard runtime objects, so they can be introspected safely. Nevertheless, they can implement complex back-ends and provide access to larger-than-memory data that never needs to materialize into memory fully. Implementing objects via userfaults also has an impact on performance as overhead is amortized over loading large segments of data and the host runtime can rely on direct memory accesses into userfault object.

Future work includes implementing a mechanism for supporting recursive calls between UFOs and reacting to specific memory access patterns to limit unnecessary memory usage. We would also like to explore the applicability of this approach outside of the Linux ecosystem and in other language runtimes.

Acknowledgements.
This work is supported by the Czech Ministry of Education, Youth and Sports from the Czech Operational Programme Research, Development, and Education, under grant agreement No. CZ.02.1.01/0.0/0.0/15_003/0000421 and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 695412).

References

  • (1)
  • Becker (2020) Gabriel Becker. 2020. Alternative Representations of R Objects or ALTREP. In Proceesings of BioC2020.
  • Bemis and Vitek (2017) Kylie A Bemis and Olga Vitek. 2017. matter: an R package for rapid prototyping with larger-than-memory datasets on disk. Bioinformatics 33 (2017). Issue 19. https://doi.org/10.1093/bioinformatics/btx392
  • Kane et al. (2013) Michael Kane, John W. Emerson, and Stephen Weston. 2013. Scalable Strategies for Computing with Massive Data. Journal of Statistical Software 55, i14 (2013). https://doi.org/10.18637/jss.v055.i14
  • O’Connor et al. (2020) Jack O’Connor, Jean-Philippe Aumasson, Samuel Neves, and Zooko Wilcox-O’Hearn. 2020. ”BLAKE3: One function, fast everywhere (whitepaper)”. https://github.com/BLAKE3-team/BLAKE3-specs
  • Oracle (1997) Oracle. 1997. ”Java Remote Method Invocation Specification (whitepaper)”.
  • Privé et al. (2018) Florian Privé, Hugues Aschard, Andrey Ziyatdinov, and Michael G B Blum. 2018. Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr. Bioinformatics 34, 16 (03 2018), 2781–2787. https://doi.org/10.1093/bioinformatics/bty185
  • Rocklin (2015) Matthew Rocklin. 2015. Dask: Parallel computation with blocked algorithms and task scheduling. In Proceedings of the 14th Python in Science Conference, Vol. 126.
  • Wickham and Ruiz (2018) Hadley Wickham and Edgar Ruiz. 2018. dbplyr: A ’dplyr’ Back End for Databases. R package version 1, 1.2018 (2018).
  • ZJ and Poon (2018) Dai ZJ and Jacky Poon. 2018. disk.frame: Larger-than-RAM Disk-Based Data Manipulation Framework. R package version 0, 5.0 (2018).