Automated service monitoring in the deployment of ARCHER2

03/21/2023
by   Kieran Leach, et al.
0

The ARCHER2 service, a CPU based HPE Cray EX system with 750,080 cores (5,860 nodes), has been deployed throughout 2020 and 2021, going into full service in December of 2021. A key part of the work during this deployment was the integration of ARCHER2 into our local monitoring systems. As ARCHER2 was one of the very first large-scale EX deployments, this involved close collaboration and development work with the HPE team through a global pandemic situation where collaboration and co-working was significantly more challenging than usual. The deployment included the creation of automated checks and visual representations of system status which needed to be made available to external parties for diagnosis and interpretation. We will describe how these checks have been deployed and how data gathered played a key role in the deployment of ARCHER2, the commissioning of the plant infrastructure, the conduct of HPL runs for submission to the Top500 and contractual monitoring of the availability of the ARCHER2 service during its commissioning and early life.

READ FULL TEXT

page 3

page 4

page 5

page 7

research
01/28/2019

Optimal and Automated Deployment for Microservices

Microservices are highly modular and scalable Service Oriented Architect...
research
08/31/2020

A Multisite, Report-Based, Centralized Infrastructure for Feedback and Monitoring of Radiology AI/ML Development and Clinical Deployment

An infrastructure for multisite, geographically-distributed creation and...
research
05/13/2020

The Scalable Systems Laboratory: a Platform for Software Innovation for HEP

The Scalable Systems Laboratory (SSL), part of the IRIS-HEP Software Ins...
research
01/14/2019

The Design and Deployment of an End-to-end IoT Infrastructure for the Natural Environment

Internet of Things (IoT) systems have seen recent growth in popularity f...
research
05/12/2019

Interoperator fixed-mobile network sharing

We propose the novel idea of interoperator fixed-mobile network sharing,...
research
07/29/2021

Concept for a Technical Infrastructure for Management of Predictive Models in Industrial Applications

With the increasing number of created and deployed prediction models and...
research
03/17/2021

Sliceable Monolith: Monolith First, Microservices Later

We propose Sliceable Monolith, a new methodology for developing microser...

Please sign up or login with your details

Forgot password? Click here to reset