Efficient Race Detection with Futures

01/03/2019 ∙ by Robert Utterback, et al. ∙ Georgetown University Monmouth College Washington University in St Louis 0

This paper addresses the problem of provably efficient and practically good on-the-fly determinacy race detection in task parallel programs that use futures. Prior works determinacy race detection have mostly focused on either task parallel programs that follow a series-parallel dependence structure or ones with unrestricted use of futures that generate arbitrary dependences. In this work, we consider a restricted use of futures and show that it can be race detected more efficiently than general use of futures. Specifically, we present two algorithms: MultiBags and MultiBags+. MultiBags targets programs that use futures in a restricted fashion and runs in time O(T_1 α(m,n)), where T_1 is the sequential running time of the program, α is the inverse Ackermann's function, m is the total number of memory accesses, n is the dynamic count of places at which parallelism is created. Since α is a very slowly growing function (upper bounded by 4 for all practical purposes), it can be treated as a close-to-constant overhead. MultiBags+ an extension of MultiBags that target programs with general use of futures. It runs in time O((T_1+k^2)α(m,n)) where T_1, α, m and n are defined as before, and k is the number of future operations in the computation. We implemented both algorithms and empirically demonstrate their efficiency.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


This research was supported in part by National Science Foundation grants CCF-1150036, CCF-1527692, CCF-1733873, and XPS-1439062. We thank the referees and our shepherd for their excellent comments.

Uncomment the following lines for short conference/journal names @StringSODA = SODA @StringJACM = Journal of the ACM @StringSPAA = SPAA @StringPPoPP = PPoPP @StringPLDI = PLDI @StringSTOC = STOC @StringFOCS = FOCS @StringESA = ESA @StringALP = Colloquium on Automata, Languages, and Programming @StringSWAT = SWAT @StringJALGO = Journal of Algorithms @StringPODC = PODC @StringLNCS = LNCS @StringSUPERCOMP = Supercomputing @StringICCSE = Israeli Conference on Computer Systems Engineering @StringCMD = Conference on Management of Data


  • (1)
  • Agrawal et al. (2018) Kunal Agrawal, Joseph Devietti, Jeremy T. Fineman, I-Ting Angelina Lee, Robert Utterback, and Changming Xu. 2018. Race Detection and Reachability in Nearly Series-parallel DAGs. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’18). Society for Industrial and Applied Mathematics, New Orleans, Louisiana, 156–171. http://dl.acm.org/citation.cfm?id=3174304.3175277
  • Arora et al. (1998) Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 1998. Thread Scheduling for Multiprogrammed Multiprocessors. In 10th Annual ACM Symposium on Parallel Algorithms and Architectures. 119–129.
  • Arvind et al. (1986) Arvind, R.S. Nikhil, and K.K. Pingali. 1986. I-structures: Data Structures for Parallel Computing. In Proceedings of the Graph Reduction Workshop.
  • Baker and Hewitt (1977) Henry C. Baker, Jr. and Carl Hewitt. 1977. The incremental garbage collection of processes. SIGPLAN Notices 12, 8 (1977), 55–59.
  • Barik et al. (2009) Rajkishore Barik, Zoran Budimlić, Vincent Cavè, Sanjay Chatterjee, Yi Guo, David Peixotto, Raghavan Raman, Jun Shirako, Sağnak Taşırlar, Yonghong Yan, Yisheng Zhao, and Vivek Sarkar. 2009. The Habanero Multicore Software Research Project. In Proceedings of the 24th ACM SIGPLAN Conference Companion on Object Oriented Programming Systems Languages and Applications (OOPSLA ’09). ACM, Orlando, Florida, USA, 735–736.
  • Bender et al. (2004) Michael A. Bender, Jeremy T. Fineman, Seth Gilbert, and Charles E. Leiserson. 2004. On-the-Fly Maintenance of Series-Parallel Relationships in Fork-Join Multithreaded Programs. In 16th Annual ACM Symposium on Parallel Algorithms and Architectures. 133–144.
  • Bienia et al. (2008) Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT. ACM, 72–81.
  • Bienia and Li (2010) Christian Bienia and Kai Li. 2010. Characteristics of Workloads Using the Pipeline Programming Model. In ISCA. Springer-Verlag, 161–171.
  • Blelloch et al. (1997) Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias, and Girija J. Narlikar. 1997. Space-Efficient Scheduling of Parallelism with Synchronization Variables. In 9th Annual ACM Symposium on Parallel Algorithms and Architectures. 12–23.
  • Blelloch and Reid-Miller (1997) Guy E. Blelloch and Margaret Reid-Miller. 1997. Pipelining with futures. In SPAA. ACM, 249–259.
  • Budimlić et al. (2010) Zoran Budimlić, Michael Burke, Vincent Cavé, Kathleen Knobe, Geoff Lowney, Ryan Newton, Jens Palsberg, David Peixotto, Vivek Sarkar, Frank Schlimbach, and Sağnak Taşırlar. 2010. Concurrent Collections. Journal of Scientific Programming 18, 3-4 (Aug. 2010), 203–217.
  • Cavé et al. (2011) Vincent Cavé, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2011. Habanero-Java: the new adventures of old X10. In Proceedings of the 9th International Conference on Principles and Practice of Programming in Java (PPPJ ’11). 51–61.
  • Chandra et al. (1994) Rohit Chandra, Anoop Gupta, and John L. Hennessy. 1994. COOL: An Object-Based Language for Parallel Programming. IEEE Computer 27, 8 (Aug. 1994), 13–26.
  • Charles et al. (2005) Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An Object-Oriented Approach to Non-Uniform Cluster Computing. In 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. 519–538.
  • Che et al. (2009) Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC). 44–54.
  • Cormen et al. (2009) Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (third ed.). The MIT Press.
  • Danaher et al. (2008) John S. Danaher, I-Ting Angelina Lee, and Charles E. Leiserson. 2008. Programming with exceptions in JCilk. Science of Computer Programming 63, 2 (Dec. 2008), 147–171.
  • Dimitrov et al. (2015) Dimitar Dimitrov, Martin Vechev, and Vivek Sarkar. 2015. Race Detection in Two Dimensions. In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’15). ACM, Portland, Oregon, USA, 101–110. https://doi.org/10.1145/2755573.2755601
  • Feng and Leiserson (1997) Mingdong Feng and Charles E. Leiserson. 1997. Efficient Detection of Determinacy Races in Cilk Programs. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA). 1–11.
  • Feng and Leiserson (1999) Mingdong Feng and Charles E. Leiserson. 1999. Efficient Detection of Determinacy Races in Cilk Programs. Theory of Computing Systems 32, 3 (1999), 301–326.
  • Fineman (2005) Jeremy T. Fineman. 2005. Provably Good Race Detection That Runs in Parallel. Master’s thesis. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, Cambridge, MA.
  • Flanagan and Freund (2009) Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: efficient and precise dynamic race detection. SIGPLAN Not. 44, 6 (June 2009), 121–133.
  • Fluet et al. (2010) Matthew Fluet, Mike Rainey, John Reppy, and Adam Shaw. 2010. Implicitly Threaded Parallelism in Manticore. Journal of Functional Programming 20, 5-6 (Nov. 2010), 537–576. https://doi.org/10.1017/S0956796810000201
  • Friedman and Wise (1978) D.P. Friedman and D.S. Wise. 1978. Aspects of Applicative Programming for Parallel Processing. IEEE Trans. Comput. C-27, 4 (1978), 289–296.
  • Frigo et al. (1998) Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In PLDI. ACM, 212–223.
  • Halstead (1985) Robert H. Halstead, Jr. 1985. Multilisp: A Language for Concurrent Symbolic Computation. ACM TOPLAS 7, 4 (Oct. 1985), 501–538.
  • Herlihy and Liu (2014) Maurice Herlihy and Zhiyu Liu. 2014. Well-structured Futures and Cache Locality. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’14). ACM, Orlando, Florida, USA, 155–166. https://doi.org/10.1145/2555243.2555257
  • Intel (2013) Intel 2013. Intel® Cilk Plus. https://www.cilkplus.org. (2013).
  • Intel Corporation (2012) Intel Corporation 2012. Intel(R) Threading Building Blocks. Intel Corporation. Available from http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/index.htm.
  • Intel Corporation (2013) Intel Corporation 2013. Intel® Cilk Plus Language Extension Specification, Version 1.1. Intel Corporation. Document 324396-002US. Available from http://cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_2.htm.
  • Kini et al. (2017) Dileep Kini, Umang Mathur, and Mahesh Viswanathan. 2017. Dynamic Race Prediction in Linear Time. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 157–170. https://doi.org/10.1145/3062341.3062374
  • Kogan and Herlihy (2014) Alex Kogan and Maurice Herlihy. 2014. The Future(s) of Shared Data Structures. In Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing (PODC ’14). ACM, Paris, France, 30–39. http://doi.acm.org/10.1145/2611462.2611496
  • Kranz et al. (1989) David A. Kranz, Robert H. Halstead, Jr., and Eric Mohr. 1989. Mul-T: A High-Performance Parallel Lisp. In PLDI. ACM, 81–90.
  • Lee and Schardl (2015) I-Ting Angelina Lee and Tao B. Schardl. 2015. Efficiently Detecting Races in Cilk Programs That Use Reducer Hyperobjects. In SPAA ’15: Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures (SPAA ’15). ACM, Portland, Oregon, USA, 111–122. http://doi.acm.org/10.1145/2755573.2755599
  • Leiserson (2010) Charles E. Leiserson. 2010. The Cilk++ Concurrency Platform. J. Supercomputing 51, 3 (2010), 244–257.
  • Liu et al. (2016) Peng Liu, Omer Tripp, and Xiangyu Zhang. 2016. IPA: Improving Predictive Analysis with Pointer Analysis. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA 2016). ACM, New York, NY, USA, 59–69. https://doi.org/10.1145/2931037.2931046
  • Lu et al. (2014) Li Lu, Weixing Ji, and Michael L. Scott. 2014. Dynamic Enforcement of Determinism in a Parallel Scripting Language. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, Edinburgh, United Kingdom, 519–529. https://doi.org/10.1145/2594291.2594300
  • Mellor-Crummey (1991) John Mellor-Crummey. 1991. On-the-fly Detection of Data Races for Programs with Nested Fork-Join Parallelism. In Proceedings of Supercomputing’91. 24–33.
  • Netzer and Miller (1992) Robert H. B. Netzer and Barton P. Miller. 1992. What are Race Conditions? ACM Letters on Programming Languages and Systems 1, 1 (March 1992), 74–88.
  • O’Callahan and Choi (2003) Robert O’Callahan and Jong-Deok Choi. 2003. Hybrid Dynamic Data Race Detection. In Proceedings of the Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’03). ACM, New York, NY, USA, 167–178.
  • OpenMP 4.0 (2013) OpenMP 4.0 2013. OpenMP Application Program Interface, Version 4.0.
  • Pozniansky and Schuster (2003) Eli Pozniansky and Assaf Schuster. 2003. Efficient On-the-fly Data Race Detection in Multithreaded C++ Programs. (2003), 179–190.
  • Raman et al. (2010) Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, and Eran Yahav. 2010. Efficient Data Race Detection for Async-Finish Parallelism. In Runtime Verification, Howard Barringer, Ylies Falcone, Bernd Finkbeiner, Klaus Havelund, Insup Lee, Gordon Pace, Grigore Rosu, Oleg Sokolsky, and Nikolai Tillmann (Eds.). Lecture Notes in Computer Science, Vol. 6418. Springer Berlin / Heidelberg, 368–383.
  • Raman et al. (2012) Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, and Eran Yahav. 2012. Scalable and Precise Dynamic Datarace Detection for Structured Parallelism. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’12). 531–542.
  • Reinders (2007) James Reinders. 2007. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media, Inc.
  • Said et al. (2011) Mahmoud Said, Chao Wang, Zijiang Yang, and Karem Sakallah. 2011. Generating Data Race Witnesses by an SMT-based Analysis. In Proceedings of the Third International Conference on NASA Formal Methods (NFM’11). Springer-Verlag, Berlin, Heidelberg, 313–327. http://dl.acm.org/citation.cfm?id=1986308.1986334
  • Savage et al. (1997) Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas Anderson. 1997. Eraser: A Dynamic Race Detector for Multi-Threaded Programs. In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles (SOSP).
  • Serebryany and Iskhodzhanov (2009) Konstantin Serebryany and Timur Iskhodzhanov. 2009. ThreadSanitizer: Data Race Detection in Practice. In Proceedings of the Workshop on Binary Instrumentation and Applications (WBIA ’09). ACM, New York, New York, 62–71.
  • Smaragdakis et al. (2012) Yannis Smaragdakis, Jacob Evans, Caitlin Sadowski, Jaeheon Yi, and Cormac Flanagan. 2012. Sound Predictive Race Detection in Polynomial Time. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’12). ACM, New York, NY, USA, 387–400. https://doi.org/10.1145/2103656.2103702
  • Spoonhower et al. (2009) Daniel Spoonhower, Guy E. Blelloch, Phillip B. Gibbons, and Robert Harper. 2009. Beyond Nested Parallelism: Tight Bounds on Work-stealing Overheads for Parallel Futures. In Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures (SPAA ’09). ACM, Calgary, AB, Canada, 91–100. https://doi.org/10.1145/1583991.1584019
  • Surendran and Sarkar (2016a) Rishi Surendran and Vivek Sarkar. 2016a. Automatic Parallelization of Pure Method Calls via Conditional Future Synthesis. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016). ACM, New York, NY, USA, 20–38. https://doi.org/10.1145/2983990.2984035
  • Surendran and Sarkar (2016b) Rishi Surendran and Vivek Sarkar. 2016b. Brief Announcement: Dynamic Determinacy Race Detection for Task Parallelism with Futures. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’16). ACM, Asilomar State Beach, CA, USA, 95–97.
  • Surendran and Sarkar (2016c) Rishi Surendran and Vivek Sarkar. 2016c. Dynamic Determinacy Race Detection for Task Parallelism with Futures. Springer International Publishing, Cham, 368–385. https://doi.org/10.1007/978-3-319-46982-9_23
  • Tarjan (1975) Robert Endre Tarjan. 1975. Efficiency of a Good But Not Linear Set Union Algorithm. J. ACM 22, 2 (April 1975), 215–225.
  • Taşırlar and Sarkar (2011) Sağnak Taşırlar and Vivek Sarkar. 2011. Data-Driven Tasks and Their Implementation. In Proceedings of the 2011 International Conference on Parallel Processing (ICPP ’11). IEEE Computer Society, Taipei City, Taiwan, 652–661.
  • Utterback et al. (2016) Robert Utterback, Kunal Agrawal, Jeremy Fineman, and I-Ting Angelina Lee. 2016. Provably Good and Practically Efficient Parallel Race Detection for Fork-Join Programs. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’16). ACM, Asilomar State Beach, CA, USA, 83–94.
  • Valdes (1978) Jacobo Valdes. 1978. Parsing Flowcharts and Series-Parallel Graphs. Ph.D. Dissertation. Stanford University. STAN-CS-78-682.
  • von Praun and Gross (2001) Christoph von Praun and Thomas R. Gross. 2001. Object Race Detection. In Proceedings of the 16th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA ’01). ACM, Tampa Bay, FL, USA, 70–82.
  • Xu et al. (2018) Yifan Xu, I-Ting Angelina Lee, and Kunal Agrawal. 2018. Efficient Parallel Determinacy Race Detection for Two-dimensional Dags. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). ACM, Vienna, Austria, 368–380. http://doi.acm.org/10.1145/3178487.3178515
  • Yu et al. (2005) Yuan Yu, Tom Rodeheffer, and Wei Chen. 2005. RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking. In Proceedings of the Twentieth ACM Symposium on Operating Systems Principles (SOSP ’05). ACM, New York, NY, USA, 221–234.

Appendix A Artifact Appendix

a.1. Abstract

This artifact contains source code for the compiler, runtime system, and benchmarks used in the PPoPP 2019 paper Efficient Race Detection with Futures, plus shell scripts that compile everything and run the benchmarks. The hardware requirements are any modern multicore CPU, while the software requirements include a relatively recent Linux distribution (tested on Ubuntu 16.04), the datamash package, and the GNU gold linker. To validate the results, run the test scripts and compare the results to figures 6, 7, and 8 in the paper.

a.2. Artifact check-list (meta-information)

  • Program: C/C++ code.

  • Compilation: Modified fork of clang++ with -O3 -flto flags. To fully reproduce the reproduce the results, we recommend installing the GNU gold linker as ld.

  • Data set: The dedup benchmark uses publicly available data sets. Scripts in the repository will download and setup all data sets.

  • Run-time environment: Tested on Ubuntu 16.04, but expected to work on any modern Linux.

  • Hardware: Any modern multicore CPU; tested on an Intel®Xeon® CPU E5-2665 with hyperthreading disabled. Enabling hyperthreading may change results.

  • Metrics: Runtime (in seconds).

  • Output: Runtime and standard deviation for all benchmarks, each run with 12 configurations which determine what kind of futures and which race detection algorithm are used and what level of instrumentation/race detection is turned on — baseline, reachability only, reachability + memory instrumentation, and full race detection.

  • How much disk space required (approximately)?: 13GB.

  • How much time is needed to prepare workflow (approximately)?: 1.5 hours.

  • How much time is needed to complete experiments (approximately)?: 4 hours.

  • Publicly available?: Yes

  • Code/data licenses (if publicly available)?: MIT.

a.3. Description

a.3.1. How delivered

The project is available on Gitlab at https://gitlab.com/wustl-pctg-pub/futurerd2.git.

a.3.2. Hardware dependencies

Any modern multicore CPU. It was tested on an Intel®Xeon®CPU E5-2665.

a.3.3. Software dependencies

The project was tested on Ubuntu 16.04, but it is expected to run correctly in other Linux distributions. To fully reproduce the results, link-time optimization should be used (-flto) with the GNU gold linker installed as ld. On our system we make /usr/bin/ld a shell script that forwards its arguments to gold whenever the USE_GOLD environment variable is set and the original ld otherwise.

The benchmark script requires GNU datamash, which can be installed using apt-get in Ubuntu 14+ or can be obtained from https://www.gnu.org/software/datamash. Bash 4+ should be used to run the scripts.

a.3.4. Data sets

All required datasets are downloaded by scripts included in the distribution.

a.4. Installation

The setup.sh script in the project repository will build our modified compiler, the modified Cilk Plus runtime, and all the benchmarks.

a.5. Experiment workflow

  • Clone the source code to your machine:

    $ git clone
    >  https://gitlab.com/wustl-pctg-pub/futurerd2.git
    $ cd futurerd2
  • Install GNU gold as your linker. Modern versions of the GNU binutils package include gold, though for our purposes the system ld should point to gold. Installing gold also installs a header called plugin-api.h, usually in either /usr/include or /usr/local/include. Find this file and replace the BINUTILS_PLUGIN_DIR variable in build-llvm-linux.sh with this path.

  • Install other software dependencies. In Ubuntu 14+, this is as simple as

    $ sudo apt-get install datamash zlib1g zlib1g-dev openssl

    and making sure you have Bash 4+.

  • Build the necessary components. The setup.sh script will build the compiler and download and unpack the necessary data sets.

  • Run the benchmark script (bench/run.sh). The script compiles the runtime library and race detection library, and compiles and runs each configuration of each benchmark. Tuning parameters can be found in bench/time.sh (which the run.sh script uses) — feel free to examine the script and change parameters, such as the number of iterations for each benchmark.

    $ cd bench
    $ ./run.sh
  • Observe the results. Once completed, full results can be found in the files times.ss.csv (benchmarks used MultiBags race detection algorithm with structured futures), times.ns.csv (benchmarks used MultiBags+ algorithm with structured futures), and times.nn.csv (benchmarks used MultiBags+ algorithm with general futures).

a.6. Evaluation and expected result

Although absolute times will differ on your machine, you should see similar relative overhead for the benchmarks. Compare the results to figures 6, 7, and 8 in the paper.

a.7. Notes

Please send feedback or file issues at our gitlab repository (https://gitlab.com/wustl-pctg-pub/futurerd2).

a.8. Methodology

Submission, reviewing and badging methodology: