Knowledge Learning-based Adaptable System for Sensitive Information Identification and Handling

09/08/2021
by   Akshar Kaul, et al.
0

Diagnostic data such as logs and memory dumps from production systems are often shared with development teams to do root cause analysis of system crashes. Invariably such diagnostic data contains sensitive information and sharing it can lead to data leaks. To handle this problem we present Knowledge and Learning-based Adaptable System for Sensitive InFormation Identification and Handling (KLASSIFI) which is an end to end system capable of identifying and redacting sensitive information present in diagnostic data. KLASSIFI is highly customizable, allowing it to be used for various different business use cases by simply changing the configuration. KLASSIFI ensures that the output file is useful by retaining the metadata which is used by various debugging tools. Various optimizations have been done to improve the performance of KLASSIFI. Empirical evaluation of KLASSIFI shows that it is able to process large files (128 GB) in 84 minutes and its performance scales linearly with varying factors. This points to practicability of KLASSIFI

READ FULL TEXT

page 5

page 20

page 21

research
03/31/2020

Deep Learning based Frameworks for Handling Imbalance in DGA, Email, and URL Data Analysis

Deep learning is a state of the art method for a lot of applications. Th...
research
11/01/2019

Fast Dimensional Analysis for Root Cause Investigation in a Large-Scale Service Environment

Root cause analysis in a large-scale production environment is challengi...
research
05/25/2023

Empowering Practical Root Cause Analysis by Large Language Models for Cloud Incidents

Ensuring the reliability and availability of cloud services necessitates...
research
05/04/2018

Assessing Data Usefulness for Failure Analysis in Anonymized System Logs

System logs are a valuable source of information for the analysis and un...
research
04/30/2023

Sensitive Data Detection with High-Throughput Machine Learning Models in Electrical Health Records

In the era of big data, there is an increasing need for healthcare provi...
research
06/25/2020

Secure and Scalable Data Classification

Content based data classification is an open challenge. Traditional Data...
research
06/25/2020

Scalable Data Classification for Security and Privacy

Content based data classification is an open challenge. Traditional Data...

Please sign up or login with your details

Forgot password? Click here to reset