I Introduction and Motivational Use Case
As part of the Fermilab HEPCloud project , we have been constructing an intelligent decision support system (IDSS). HEPCloud is rapidly becoming the primary system for provisioning compute resources for all Fermilab-affiliated experiments. This provisioning is responsible for managing time allocations and monetary budget usage. It spans facilities including the High Performance Computing centers like Cori at National Energy Research Scientific Computing Center and commercial clouds like Google Compute Engine and Amazon Web Services.
Our IDSS, the Decision Engine (DE) , provides the automation of requests for computing resource allocations across all participating experiments and affiliated facilities. An overall goal of the DE is to use both administration-defined and management-defined policies to create resource scheduling requests on behalf of the HEPCloud facility. The DE is responsible for ensuring that policies are applied in a reliable, traceable and consistent manner. The policies that are carried out ultimately result in resource requests, and ensure that these requests match incoming job requirements.
In order to reliably meet peak demands, Fermilab must plan to provision enough resources to cover any forecasted peak. Using some other statistic such as median demand can be cost ineffective, since some resources may be underutilized during non-peak periods—even when resource sharing (enabled by HEP grids) is accounted for. Scientific productivity will be affected if the demand is underestimated, since there is a long lead time to significantly increase the use of local or remote resources. HEPCloud intends to mitigate these problems by intelligently extending the current Fermilab compute facility to execute jobs submitted by scientists on a diverse set of resources. The DE is a key component in this process. It provides the necessary real-time infrastructure to efficiently operate in an era of diverse resource needs, competitive cloud resource providers, and HPC facilities.
A typical workflow executed by a workload management systems (WMS) in provisioning computing resources for facility expansion is shown in figures 3 through 3. This multi-stage decision making can alternatively be combined into a single decision making stage, but, at the cost of adding complexity to the system. During each stage, the WMS periodically executes, the information gathering phase (first row of each of these diagrams), the decision making phase (decision block) and the result publication phase (publish block). First, the WMS queries different systems and services to identify computational jobs in various job queues that are in need for compute resources. Based on the the jobs and resources manifests, the WMS short lists candidate resources eligible to run these jobs. During the second stage, the WMS uses these shortlisted resources, their price performance metrics, their costing information, current occupancy and state. It ranks them based on a given criteria like figure of merit (cost–benefit) in this case. During the final stage the WMS applies administration-defined and management-defined policies to generate resource requests that are used by a provisioner to expand the facility.
Ii Architectural Overview
The primary drivers of the design were: (1) the need for a framework that enforces the processing stages defined and implemented by the program, and which provides for the injection of user-supplied code and expert knowledge; (2) the need for a configuration and assembly system that instantiates the appropriate user-supplied code, and that provides the necessary context-dependent information to realize different parameterizations of that code; and (3) a means to manage the data being processed and the varying timescales for the relevance and validity of those data.
The DE is a system that can manage and run algorithms of varying complexity for the purpose of requesting resources for computing jobs. The DE defines a Decision Channel as a grouping of tasks that generate a decision. A decision consists of a recommendation of one or more actions that should be executed (such as allocation of computing resources), actions that are directly executed (such as updating of monitoring systems), or both. The modularity provided by Decision Channels allows the DE to manage decision making processes as distinct units and allows different algorithms to be developed and tested independently by different domain experts.
Each Decision Channel task, implemented as a Python class, contains several modules, each of which adheres to a common protocol. We define four module types: Source, Transform, Logic Engine, and Publisher. A Decision Channel minimally consists of one of each of these kinds of modules as shown in Figure 4. Each module adheres to a specific contract that governs how the modules connect. For example, each module (except Sources) expresses the names of all the data products the module consumes, and (except for Publishers) the names of all the data products the module produces.
Each Source is scheduled periodically by the framework and is responsible for communicating with an external system to gather data that acts as input to the decision making process. A Transform module contains algorithms to convert input data into new data. A Transform consumes one or more data products (produced by one or more Sources, Transforms, or both) within a Decision Channel, and produces one or more new data products. The Logic Engine is a rule-based forward-chaining inference engine that operates on facts. Each fact has a name and an expression that evaluates to a boolean. The value of a fact is the value of the expression. Expressions can access and operate on data produced by Source and Transform modules. A rule consists of a condition composed of references to facts and boolean operations on their values. Actions are triggered when the rule evaluates to boolean “True”. Logic Engine rules can produce new facts that evaluate to the result of the rule’s boolean expression. This fact can be used by subsequent rules. In this manner, rules can be developed separately as blocks and chained together. Publishers consume data products produced by Sources and Transforms. They use remotely exposed APIs to publish the data products to the external systems.
This manuscript has been authored by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy, Office of Science, Office of High Energy Physics. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
-  Holzman, Burt et al., “HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation,” Computing and Software for Big Science, vol. 1, no. 1, p. 1, Sep 2017.
-  Tiradani, Anthony et al., “Fermilab HEPCloud Facility Decision Engine Design,” 2017.