Pre-treatment of outliers and anomalies in plant data: Methodology and case study of a Vacuum Distillation Unit

06/17/2021
by   Kamil Oster, et al.
0

Data pre-treatment plays a significant role in improving data quality, thus allowing extraction of accurate information from raw data. One of the data pre-treatment techniques commonly used is outliers detection. The so-called 3σ method is a common practice to identify the outliers. As shown in the manuscript, it does not identify all outliers, resulting in possible distortion of the overall statistics of the data. This problem can have a significant impact on further data analysis and can lead to reduction in the accuracy of predictive models. There is a plethora of various techniques for outliers detection, however, aside from theoretical work, they all require case study work. Two types of outliers were considered: short-term (erroneous data, noise) and long-term outliers (e.g. malfunctioning for longer periods). The data used were taken from the vacuum distillation unit (VDU) of an Asian refinery and included 40 physical sensors (temperature, pressure and flow rate). We used a modified method for 3σ thresholds to identify the short-term outliers, i.e. ensors data are divided into chunks determined by change points and 3σ thresholds are calculated within each chunk representing near-normal distribution. We have shown that piecewise 3σ method offers a better approach to short-term outliers detection than 3σ method applied to the entire time series. Nevertheless, this does not perform well for long-term outliers (which can represent another state in the data). In this case, we used principal component analysis (PCA) with Hotelling's T^2 statistics to identify the long-term outliers. The results obtained with PCA were subject to DBSCAN clustering method. The outliers (which were visually obvious and correctly detected by the PCA method) were also correctly identified by DBSCAN which supported the consistency and accuracy of the PCA method.

READ FULL TEXT
research
04/13/2019

Self-Paced Probabilistic Principal Component Analysis for Data with Outliers

Principal Component Analysis (PCA) is a popular tool for dimensionality ...
research
06/08/2022

Robust self-tuning semiparametric PCA for contaminated elliptical distribution

Principal component analysis (PCA) is one of the most popular dimension ...
research
06/25/2021

Self-paced Principal Component Analysis

Principal Component Analysis (PCA) has been widely used for dimensionali...
research
11/29/2022

Identification of the Breach of Short-term Rental Regulations in Irish Rent Pressure Zones

The housing crisis in Ireland has rapidly grown in recent years. To make...
research
05/10/2019

Refined Complexity of PCA with Outliers

Principal component analysis (PCA) is one of the most fundamental proced...
research
02/22/2023

On the efficiency-loss free ordering-robustness of product-PCA

This article studies the robustness of the eigenvalue ordering, an impor...
research
12/01/2018

A Big Data Architecture for Log Data Storage and Analysis

We propose an architecture for analysing database connection logs across...

Please sign up or login with your details

Forgot password? Click here to reset