The IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3
SR) is a long-term collaboration between IBM Research and the University of Illinois at Urbana-Champaign focused on developing advanced Cognitive Computing and Artificial Intelligence (AI) systems that are optimized across the vertical stacks of AI solutions, software, and hardware systems. In particular, C3SR aims to develop technologies to improve cognitive application developers’ productivity on heterogeneous infrastructure.
This effort demands us to perform various system characterization and performance measurements work at all levels of computer system abstraction, including processors (such as X86, POWER, and ARM cores), system communication links (such as PCIe, NVLinks, CAPI/OpenCAPI), special system accelerators (both discrete ones such as FPGAs and integrated ones such as Tensor Cores), libraries (such as CUDA and CuDNN), and frameworks (such as TensorFlow, Caffee, and PyTorch).
Instead of developing ad-hoc performance measurement solutions for each task, we decide to develop a common system characterization and benchmarking infrastructure and tooling for all of our tasks, providing some uniformity across the various tasks.
As time passes by, we find that this infrastructure is proven not only useful, but also greatly boosted our productivity. A number of interesting systems projects at C3SR have already benefited from having such an infrastructure and tooling. Therefore, we decide to open source this infrastructure so that other research teams interested in systems work can also benefit from our work. Hence the genesis of this C3SR project: SCOPE.
Ii Design Philosophy
This whitepaper describes the motivation and design of the initial release (v1.0) of the Scope benchmarking infrastructure. Scope is designed around three primary goals:
extensibility: It should be easy for external groups to develop independent benchmarks without requiring centralized coordination with the Scope project. This allows different teams to develop their own measurement tools to fit their own needs.
portability: The Scope infrastructure should support as many different systems as possible. Though individual benchmarks may have specific requirements, Scope itself should not be a barrier to running benchmarks on a particular system. Scope has been tested on POWER8/POWER9- and x86_64-based systems, but should support any system that has a C++11 compiler and the CUDA  toolkit.
development silos: New groups of benchmarks should be able to be open- or closed-source. Each group of benchmarks may have its own software dependencies, compiler feature requirements, or other necessities, and those requirements should not be globally propagated to all benchmark code. This allows Scope to remain system-agnostic and useful to the widest possible audience.
The Scope infrastructure consists of three kinds of software components. First, the SCOPE repository (Section III) manages configuration and compilation and provides shared utility functions for the benchmarks. Second, scopes (Section IV) define groups of benchmarks, with their own optional dependencies and utilities. Third, the ScopePlot project (Section V) provides a Python package for plotting and manipulating Scope results. Figure 1 shows the relationship between Scope infrastructure components.
SCOPE and ScopePlot make heavy use of existing tools and libraries, notably Git  source control, CMake  compilation configuration, the Google Benchmark library , matplotlib , bokeh , and pandas.
Ii-a Licensing, Hosting, and Contributing
The Scope infrastructure is free and open source, licensed under the Apache 2.0 license . Scope welcomes contributions. See CONTRIBUTING.md in the Scope source tree for up-to-date information about contributing. Table I lists the URLs for Scope infrastructure components.
Iii SCOPE Repository
The SCOPE repository (github.com/c3sr/scope) is the entry point for building and running benchmarks. SCOPE is maintained as a Git repository to provide an open revision history and broad accessibility. SCOPE itself does not contain any benchmark code; instead, it has the following responsibilities:
retrieve benchmark code
configure the SCOPE binary compilation
provide common utilities
provide initialization hooks
Iii-a Retrieving Code
During the download stage, Figure 2(a), the SCOPE source code is retrieved.
Iii-B Configuring the SCOPE Binary Compilation and Fetching Dependencies
Since the SCOPE repository does not contain any benchmark code, a user will typically provide benchmark code by including additional scopes (Section IV) during the build process. Scopes are semi-independent packages that contain benchmark code. For development isolation, scopes are maintained as separate projects and are included in SCOPE as CMake-aware projects in the scopes directory of the SCOPE source tree.
SCOPE includes several completed scopes (Table IV) as Git submodules. When scopes are added as submodules, it is easy to assicated each SCOPE release with a specific revision of benchmark code, and the correct version of the scopes can be automatically downloaded when downloading the whole scope project. Other scopes can be manually added to this directory.
Scope uses CMake  to configure the Scope binary compilation during the configuration stage, shown in Figure 2(b). CMake simplifies the process of supporting a variety of compilers and systems as well as fetching dependencies. Through CMake, the Git submodules in SCOPE are downloaded and the fixed revisions are checked out. SCOPE then invokes CMake’s add_subdirectory command on all directories in the scopes directory to add those scopes to the SCOPE binary build. Each scope exports a CMake Object Library  - a CMake target that represents a set of object files containing the implementation of that scope, as well as associated information about required dependencies for those object files.
Since benchmarks may only be compatible with particular systems, Scope allows conditional compilation of benchmarks. By selectively including or excluding object libraries during configuration, Scope may selectively include benchmark code and associated dependencies. Each scope’s CMakeLists.txt may define options for disabling the scope. For example, cmake -DENABLE_EXAMPLE=ON scope-source-directory directs CMake to include object files and dependencies from ExampleScope (Section IV-C) when building the Scope binary.
Scope relies on Hunter  for CMake to fetch dependencies. Some dependencies, such as spdlog, are present in Scope so all benchmarks may used them. Table II shows the libraries included in Scope v1.0.0. Other dependencies are only required for certain benchmarks. For example, CommScope requires libnuma  for pinning allocations or processes to memory regions on Linux systems, even though Scope itself does not.
Hunter downloads, configures, and builds the C++ library dependencies that SCOPE relies on. Individual scopes may also use Hunter to retrieve their dependencies, or they may require the user to provide them in some other way (for example, as libraries present on the system). If a scope uses Hunter, the scope’s dependencies will only be retrieved if that scope was enabled in the configuration step. All downloads through Hunter occur at the end of the configuration step, so that the required library and header files are present for the build.
Iii-C Compiling SCOPE
During the build stage (c), object files for all enabled scopes are produced and linked into a single binary.
In this step, every source file that was included in a scope submodule object library is compiled into an object file. The source files that implement the Scope infrastructure helpers are also compiled into object files, and then all of the objects are linked together in a single step to form the scope binary.
Iii-D Running SCOPE
Finally, running the produced SCOPE binary allows the user to select any subset of the included benchmarks and produce a data file at any location.
Iii-E Providing Common Utilities
Scope provides a set of utilities that scopes may use. The interfaces for these utilities are available to the scopes through C++ header files. When scopes include those headers, the implementations will be available when all objects are linked together.
The entire Google Benchmark library is provided to configure and register the benchmark code.
CUDA error checking is provided, as most extant benchmarks are CUDA benchmarks.
Logging is provided so that scopes may have a consistent output mechanism.
C++ functions for declaring new command line options.
C++ function for executing pre-benchmark initialization code.
Convenience CMake functions for integrating with SCOPE.
Scope also provides a tools/generate_sugar_files.py python script for generating Sugar-compatible CMake files in each scope source tree. Scope uses Sugar to read the sugar.cmake files, which tell CMake where the CUDA and C++ source files for Scope and the benchmarks are. This script is provided so that the sugar.cmake files can be quickly regenerated during development of each scope. The script will create sugar.cmake files that export the CMake variables described in Table III. <VAR> is replaced by the string passed to the “–var” flag.
|<VAR>_SOURCES||C/C++ source files|
|<VAR>_HEADERS||C/C++ header files|
|<VAR>_CUDA_SOURCES||CUDA source files|
|<VAR>_CUDA_HEADERS||CUDA header files|
Iii-F Providing CMake Functions
SCOPE provides three functions to help scopes integrate with the SCOPE CMake build. Scope_add_library is a wrapper around add_library, which also sets the SCOPE_NEW_TARGET variable so that SCOPE can link against the CMake Object Library defined in the scope. Target_include_scope_directories causes SCOPE’s utility include directories to be added to the compilation of the scope so that the SCOPE utilities can be used. SCOPE also provides scope_status, scope_warning, and scope_fatal, which scopes may use in their CMakeLists.txt files to print messages that will be visible during configure time.
Iii-G Initialization Hooks and Command-line Options
SCOPE provides the ability for benchmarks to run arbitrary initialization when the binary is executed. Scopes may register clara::Opts to create new command line arguments accepted by the SCOPE binary. Scopes may also register arbitrary code to be executed before command line arguments are parsed, or after arguments are parsed, but before any benchmarks are executed. Though these routines, benchmarks may do any unguided or user-directed initialization desired before benchmark execution.
Iv Design of SCOPE Submodules
SCOPE does not include any benchmark code — all benchmarks are provided through individual scopes. Scopes are structured directories in the Scope source tree, included in Scope through the CMake include_directory command. As of this writing, eight scopes are either completed or in development, to measure various aspects of system compute and communication performance at different levels of abstraction. Table IV describes the different scopes that are under development.
|Hardware||TCUScope||Completed||Nvidia GPU tensor cores|
|Data Transfer||CommScope||Released||Nvidia CPU-GPU communication |
|I/OScope||In Progress||Disk I/O operations|
|NCCLScope||In Progress||Nvidia’s NCCL library|
|InstrScope||In Progress||Instruction latencies and throughput|
|HistoScope||In Progress||Nvidia GPU histogramming|
|LinAlgScope||In Progress||Linear algebra operations|
Iv-a Defining a CMake Object Library
Each scope needs to include a CMakeLists.txt at the top level of its directory. This allows it to be included in the Scope configuration through the include_directory command in the Scope CMakeLists.txt. Fundamentally, each scope only needs to define a CMake object library target though the object form of the add_library command . ExampleScope defines the example_scope target. Scope suggests using Sugar and the provided tooling (Section III-E) to automate the process of providing source files to the scope_add_library command. The scope target may have its own dependencies or other constraints. ExampleScope marks itself as requiring C++11 through the target_compile_features CMake command. This tells CMake that objects in ExampleScope should be built with C++11 support in the compiler. ExampleScope also adds include directories, a CUDA language standard, and a requirement to link against the Google Benchmark library to the example_scope target. These requirements will be propagated to the entire Scope build as required.
Iv-B Integration with Scope through Git Submodules
For development isolation, scopes are maintained as independent directories. If the scope is maintained as a Git repository, the scope can have its own versioning and revision history. Additionally, this allows the scope to be included in the SCOPE repository as a Git submodule. Git submodules can be automatically downloaded alongside Scope when Scope is downloaded, and pinned to the appropriate version.
Iv-C Example Scope
The Scope project provides a template scope ExampleScope  that demonstrates how a new scope can be structured. ExampleScope is available at https://github.com/c3sr/example_scope. ExampleScope demonstrates the following required or suggested structures:
Iv-C1 CMakeLists.txt (required) and Sugar (optional)
This file defines a CMake object library and uses Sugar to parse the sugar.cmake files in the ExampleScope source tree.
Iv-C2 Code Structure (optional)
ExampleScope places all of its source files in the src directory.
Iv-C3 Benchmark Library (required)
All benchmarks are registered through the Benchmark library. This enables the Scope binary to filter, run, and report results in a consistent way.
Iv-C4 Documentation (optional)
ExampleScope contains Markdown files describes each benchmark, the algorithm, and implementation in docs in its source tree.
Iv-C5 Initialization and Command Line Flags (optional)
ExampleScope uses clara::Opts to declare two new command-line arguments, and uses the SCOPE initialization hooks to cause SCOPE to exit during initialization if those options are used. Initialization code is placed in src/init.
V Scope Utilities
V-a ScopePlot Python Package
ScopePlot is a python package available to help plot and manipulate results in the JSON files produced by SCOPE. The Google Benchmark-formatted JSON files (hereafter referred to as JSON files) produced by Scope are unmodified from the format produced by the Google Benchmark library, so ScopePlot is compatible with other tools that use that library. ScopePlot is freely available on the Python Package Index (PyPI) at https://pypi.org/project/scope-plot/. ScopePlot is also open-source, hosted at https://github.com/rai-project/scope_plot. ScopePlot uses Python’s distutils to manage installation. When ScopePlot is installed, it provides the scope_plot binary. The rest of this section describes notable scope_plot subcommands.
V-A1 spec Subcommand
scope_plot spec generates an arbitrary plot from a YAML  specification file (hereafter referred to as a spec
file). Examples of valid specification files can be found in the ScopePlot source tree. The specification file controls the plot type (line with error bars, bar plot, linear regression plot with error bars), the source JSON file for each data series, filters to extract the desired data from the JSON file, per-series data transformations, and plot styling and formatting.
V-A2 deps Subcommand
scope_plot deps can be used to help integrate the scope_plot command with GNU Makefiles. Similar to how make can be used to partially recompile code when certain files have changed, make can be used to regenerate plots when the underlying Benchmark JSON files have been updated. deps scans a spec file and emits the paths of the benchmark JSON files it depends on in make format, so that make target dependencies can be automatically generated as part of a build process.
V-A3 bar Subcommand
scope_plot bar generates a bar plot from a Benchmark JSON file. It has a subset of the functionality of the spec subcommand, without requiring a full spec file. Command line options allow the user to specify the fields used for the x- and y-axis data, and the plot title.
V-A4 cat Subcommand
scope_plot cat is inspired by the Linux/Unix “cat” command, but specialized to Google Benchmark JSON files. When passed one or more Benchmark JSON files, it concatenates the benchmarks field of those files and dumps the content to the standard output stream. In this way, it preserves the structure of the JSON files when they are concatenated, where the standard “cat” would simply append the JSON contents together, yielding a malformed result.
V-A5 filter_name Subcommand
scope_plot filter_name filters the benchmark outputs in the Benchmark JSON file to only keep benchmarks with a name that matches a provided regular expression.
V-A6 Using ScopePlot as a Library
ScopePlot may also be used as a python library to develop other JSON file manipulation and analysis tools. ScopePlot has an object model for JSON files and various methods for filtering them and converting them to pandas DataFrames.
V-B Scope Docker Images
|Dockerfile||Docker Hub Image||Description|
|amd64.cuda75.Dockerfile||c3sr/scope:amd64-cuda75-tag||x86_64, CUDA 7.5,|
|amd64.cuda80.Dockerfile||c3sr/scope:amd64-cuda80-tag||x86_64, CUDA 8.0,|
|amd64.cuda92.Dockerfile||c3sr/scope:amd64-cuda92-tag||x86_64, CUDA 9.2,|
|ppc64le.cuda80.Dockerfile||c3sr/scope:amd64-cuda80-tag||POWER, CUDA 8.0,|
|ppc64le.cuda92.Dockerfile||c3sr/scope:amd64-cuda92-tag||POWER, CUDA 9.2,|
Scope includes several Docker  images, which can be used to run benchmarks. This allows users on supported platforms to run pre-packaged versions of the benchmark code without having to compile or configure it. Table V lists the Docker files and corresponding Docker images publicly available on the Docker Hub.
Vi SCOPE Development and Maintenance
Scope and ScopePlot rely on continuous integration for testing and building Docker images. The continuous integration system is centered around Travis-CI . Whenever modifications to Scope or ScopePlot are pushed to GitHub, Travis-CI starts a series of parallel jobs. Figure 4 summarizes the continuous integration flow for Scope and ScopePlot.
Whenever a push is made to the Scope repository, Travis-CI is configured to start a series of parallel builds:
An x86-64 CUDA 9.2 build.
An x86-64 Docker CUDA 7.5 build.
An x86-64 Docker CUDA 8.0 build.
An x86-64 Docker CUDA 9.0 build.
Each of these builds incorporate CommScope and ExampleScope, the two completed scopes at the time of writing.
All of Travis’ build hardware is x86-64, so Travis is directly used to create x86-64-compatible docker images. To generate POWER-compatible Docker images, Scope uses rai , a separate job submission system. Two additional Travis jobs are started, each of which simply submit the POWER builds to rai on Oregon State University’s PowerCI  infrastructure.
POWER Docker CUDA 8.0 build.
POWER Docker CUDA 9.2 build.
If these Docker builds individually succeed, they are pushed to Docker hub to be immediately available to the public. An image corresponding to each tag is retained indefinitely. Furthermore, the most recent commit on each branch is available.
Whenever a push is made to the ScopePlot repository, Travis-CI is configured to test the ScopePlot package against Python 2.7, 3.4, 3.5, 3.6, and 3.7. If all of those tests pass, and the commit has a corresponding tag, a new version of ScopePlot is made available on the Python Package Index (PyPI) for installation with pip install scope_plot.
The Scope project arose out of a desire to lower the barrier to entry for system benchmarking in the IBM / University of Illinois Urbana-Champaign Center for Cognitive Computing Systems Research (C3SR). Scope does this by incorporating common libraries and providing convenience functions for writing new system benchmarks, easily supporting compilation on x86 and POWER platforms, and by providing a command-line tool for managing and plotting results. Scope is free and open-source, and welcomes contributors and collaborators.
The authors would like to acknowledge contribution and insight from the following people: I-Hsin Chung (IBM T. J. Watson Research Center), Sarah Hashash and Andrew Schuh (University of Illinois at Urbana-Champaign)
-  C3SR. (2018, Aug.) Center for Cognitive Computing Systems Research. [Online]. Available: http://www.c3sr.com/
-  Nvidia, CUDA C Programming Huide, Aug. 2018. [Online]. Available: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
-  (2018, Jun.) Git. https://github.com/git/git.
-  Kitware. (2018, Aug.) CMake. [Online]. Available: https://cmake.org
-  Google. (2018, Apr.) benchmark. https://github.com/google/benchmark. [Online]. Available: ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail
-  J. D. Hunter, “Matplotlib: A 2d graphics environment,” Computing in science & engineering, vol. 9, no. 3, pp. 90–95, 2007.
-  M. Paprocki, B. Van de Ven, S. Bird, L. Canavan, R. Hafen, and A. Terrel. (2018, Jun.) Bokeh. [Online]. Available: bokeh.pydata.org
-  W. McKinney et al., “Data structures for statistical computing in python,” in Proceedings of the 9th Python in Science Conference, vol. 445. Austin, TX, 2010, pp. 51–56.
-  “Apache License,” Apache Software Foundation, Jan. 2004. [Online]. Available: https://www.apache.org/licenses/LICENSE-2.0
-  Kitware, Object Libraries. [Online]. Available: https://cmake.org/cmake/help/latest/command/add_library.html
-  R. Baratov. (2018, Jul.) Hunter. https://github.com/ruslo/hunter.
-  A. Kleen. (2018, Jun.) numactl. https://github.com/numactl/numactl.
-  G. Melman. (2018, Jan.) spdlog. https://github.com/gabime/spdlog.
-  V. Zverovich and J. Muller. (2017, Dec.) fmt. https://github.com/fmtlib/fmt.
-  (2018, Mar.) Git. https://github.com/catchorg/clara.
-  R. Baratov. (2018, Jan.) Sugar. https://github.com/ruslo/sugar.
-  C. Pearson. (2018) CommScope. https://github.com/c3sr/comm_scope.
-  ——. (2018) ExampleScope. https://github.com/c3sr/example_scope.
-  YAML. [Online]. Available: yaml.org
-  D. Merkel, “Docker: lightweight linux containers for consistent development and deployment,” Linux Journal, vol. 2014, no. 239, p. 2, 2014.
-  Travis. (2018, Aug.) Travis-CI. [Online]. Available: https://travis-ci.com
-  A. Dakkak, C. Pearson, C. Li, and W.-m. Hwu, “Rai: a scalable project submission system for parallel programming courses,” in Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International. IEEE, 2017, pp. 315–322.
-  Oregon State University Open Source Lab. (2018) PowerCI. [Online]. Available: https://osuosl.org/services/powerdev