Large-Scale Detection of Non-Technical Losses in Imbalanced Data Sets

02/26/2016
by   Patrick O. Glauner, et al.
0

Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40 electricity distributed. Detecting NTLs requires costly on-site inspections. Accurate prediction of NTLs for customers using machine learning is therefore crucial. To date, related research largely ignore that the two classes of regular and non-regular customers are highly imbalanced, that NTL proportions may change and mostly consider small data sets, often not allowing to deploy the results in production. In this paper, we present a comprehensive approach to assess three NTL detection models for different NTL proportions in large real world data sets of 100Ks of customers: Boolean rules, fuzzy logic and Support Vector Machine. This work has resulted in appreciable results that are about to be deployed in a leading industry solution. We believe that the considerations and observations made in this contribution are necessary for future smart meter research in order to report their effectiveness on imbalanced and large real world data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2016

Neighborhood Features Help Detecting Non-Technical Losses in Big Data Sets

Electricity theft is a major problem around the world in both developed ...
research
09/09/2017

Identifying Irregular Power Usage by Turning Predictions into Holographic Spatial Visualizations

Power grids are critical infrastructure assets that face non-technical l...
research
02/13/2017

Is Big Data Sufficient for a Reliable Detection of Non-Technical Losses?

Non-technical losses (NTL) occur during the distribution of electricity ...
research
11/05/2020

Switching Scheme: A Novel Approach for Handling Incremental Concept Drift in Real-World Data Sets

Machine learning models nowadays play a crucial role for many applicatio...
research
11/16/2019

An "outside the box" solution for imbalanced data classification

A common problem of the real-world data sets is the class imbalance, whi...
research
09/10/2020

GeoSPARQL+: Syntax, Semantics and System for Integrated Querying of Graph, Raster and Vector Data – Technical Report

We introduce an approach to semantically represent and query raster data...
research
08/04/2021

Auto-encoder based Model for High-dimensional Imbalanced Industrial Data

With the proliferation of IoT devices, the distributed control systems a...

Please sign up or login with your details

Forgot password? Click here to reset