Turning Privacy Constraints into Syslog Analysis Advantage

01/21/2019
by   Siavash Ghiasvand, et al.
0

The mean time between failures (MTBF) of HPC systems is rapidly reducing, and that current failure recovery mechanisms e.g., checkpoint-restart, will no longer be able to recover the systems from failures. Early failure detection is a new class of failure recovery methods that can be beneficial for HPC systems with short MTBF. System logs (syslogs) are invaluable source of information which give us a deep insight about system behavior, and make the early failure detection possible. Beside normal information, syslogs contain sensitive data which might endanger users' privacy. Even though analyzing various syslogs is necessary for creating a general failure detection/prediction method, privacy concerns discourage system administrators to publish syslogs. Herein, we ensure user privacy via de-identifying syslogs, and then turning the applied constraint for addressing users' privacy into an advantage for system behavior analysis. Results indicate significant reduction in required storage space and 3 times shorter processing time.

READ FULL TEXT

page 1

page 2

research
06/11/2019

Anomaly Detection in High Performance Computers: A Vicinity Perspective

In response to the demand for higher computational power, the number of ...
research
03/24/2020

Recovery command generation towards automatic recovery in ICT systems by Seq2Seq learning

With the increase in scale and complexity of ICT systems, their operatio...
research
12/02/2022

Assessing Anonymized System Logs Usefulness for Behavioral Analysis in RNN Models

System logs are a common source of monitoring data for analyzing computi...
research
01/27/2023

JASS: A Flexible Checkpointing System for NVM-based Systems

NVM-based systems are naturally fit candidates for incorporating periodi...
research
02/02/2023

MLOps with enhanced performance control and observability

The explosion of data and its ever increasing complexity in the last few...
research
02/13/2019

Statistical Failure Mechanism Analysis of Earthquakes Revealing Time Relationships

If we assume that earthquakes are chaotic, and influenced locally then c...
research
03/29/2023

A Spatially Correlated Competing Risks Time-to-Event Model for Supercomputer GPU Failure Data

Graphics processing units (GPUs) are widely used in many high-performanc...

Please sign up or login with your details

Forgot password? Click here to reset