Data-driven Thermal Anomaly Detection for Batteries using Unsupervised Shape Clustering

by   Xiaojun Li, et al.

For electric vehicles (EV) and energy storage (ES) batteries, thermal runaway is a critical issue as it can lead to uncontrollable fires or even explosions. Thermal anomaly detection can identify problematic battery packs that may eventually undergo thermal runaway. However, there are common challenges like data unavailability, environment variations, and battery aging. We propose a data-driven method to detect battery thermal anomaly based on comparing shape-similarity between thermal measurements. Based on their shapes, the measurements are continuously being grouped into different clusters. Anomaly is detected by monitoring deviations within the clusters. Unlike model-based or other data-driven methods, the proposed method is robust to data loss and requires minimal reference data for different pack configurations. As the initial experimental results show, the method not only can be more accurate than the onboard BMS, but also can detect unforeseen anomalies at the early stage.



There are no comments yet.


page 1


A^3: Activation Anomaly Analysis

Inspired by the recent advances in coverage-guided analysis of neural ne...

Thermal Recovery of Multi-Limbed Robots with Electric Actuators

The problem of finding thermally minimizing configurations of a humanoid...

Thermal Management in Large Data Centers: Security Threats and Mitigation

Data centres are experiencing significant growth in their scale, especia...

Copula Quadrant Similarity for Anomaly Scores

Practical anomaly detection requires applying numerous approaches due to...

Robust Data-Driven Error Compensation for a Battery Model

- This work has been submitted to IFAC for possible publication - Models...

Transfer Learning for Thermal Comfort Prediction in Multiple Cities

HVAC (Heating, Ventilation and Air Conditioning) system is an important ...

Multi-Stage Fault Warning for Large Electric Grids Using Anomaly Detection and Machine Learning

In the monitoring of a complex electric grid, it is of paramount importa...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

I-a Background

Conventional anomaly detection methods for batteries usually depend on thresholds or lookup tables, often determined by lab-testing of sample batteries, and may not apply to the individual battery that operates under different conditions. On the other hand, advanced anomaly detection methods, such as machine learning-based algorithms, require significant computational resources. Traditional battery management systems (BMS) deployed on electric vehicles or energy storage systems are based on embedded micro-controllers, which lack computing power or memory to execute these algorithms on-board.

For a cloud-based BMS, the gathered data is transmitted to a data-center for further analysis, during which advanced algorithms can be utilized. As a part of the fault diagnosis process, anomaly/fault detection is the most critical step. Based on timely detection, the BMS or vehicle controller (VCU) of the EV can take proper actions, which prevent a relative small issue from developing into a server problem. There are three mainstream methods for battery fault/anomaly detection: knowledge-based, model-based, and data-driven[1]

. The threshold-based method, which is the industry’s standard practice, can be categorized as knowledge-based. In general, every battery manufacturer has its own heuristic ”recipe” combined with testing data and engineering known-hows, which can not be generalized easily. Model-based approaches often include a physical model and an estimator

[2]. In [3], a lumped battery thermal model with time-variant internal resistance, altogether with an adaptive observer, are used to estimate the core battery temperature. [4]

uses a cell difference model with an extended Kalman filter to estimate the micro-shot-circuit current, which can be used for short circuit diagnosis. Some of the data-driven approaches depend on the recurrent neural network

[5, 6]. For example, [5]

uses a long short-term memory neural network to create residual signals for battery surface temperatures. A thermal fault is set when the residual is over a certain threshold. For model-based approaches, training data is needed for finding the optimal model parameters for different types of batteries, making it difficult and time-consuming to be implemented. Besides, data anomalies like long-time unavailability and shifting can cause the model to malfunction. Relatively speaking, training recurrent neural networks requires fewer efforts. However, the neural network also depends on the continuous signal influx. For both methods, battery degradation over time poses a challenge. In conclusion, the following common issues still pose significant challenges for battery anomaly detection.

Fig. 1: Typical issues of thermal measurements for cloud-BMS data. This figure is based on fleet EV battery data.
  • Loss of data/invalid data: For cost-saving, some wireless transmission modules in BMS or vehicles usually use less reliable networks such as 3G or even 2G. Due to network issues, loss of data or invalid data is common. As depicted in Fig. 1, it may happen just intermittently or for an extended period. Data is also not available when the vehicle is shut down. Model-based approaches, such as Kalman filters and recurrent-neural networks, have internal states that rely on continuous data input. They may tolerate intermittent data loss, but long-term data loss will cause the model to lose track and give inaccurate results.

  • Asynchronous data

    : Because of communication or sensor issues, one or several signals may be sampled with some time-delay. Residual-based methods, which evaluate all signals at the exact moment, is likely to generate false positive with asynchronous data.

  • Battery aging: As the battery ages and deteriorates, the thermal and cell voltage measurement will deviate from their nominal range. Both residual and model-based method needs to adjust the threshold and parameters for them to work correctly. However, estimating the battery’s state-of-health (SOH) itself is a challenging task[7].

  • Environment variations: The environment variations, such as different geo-locations, sensor locations, and seasonal temperature changes, cause similar difficulties as battery aging. But they are more hard to be factored in the model.

  • Lack of training data: Electric vehicle and energy storage systems have different configurations. It is challenging to train the model so that it can adequately identify anomalies for all configurations. A tailored model is more accurate but is challenging to implement. Besides, labeled data may not be available for some configurations.

I-B The proposed method

To address some of the challenges mentioned above, this paper proposes a data-driven, model-free approach that monitors the shape-similarities across the measurements to detect battery thermal anomalies. The proposed method does not require the data to be continuous, making it robust to data loss and invalid data. The same reason also alleviates the effects of battery aging or environment variations since cells are likely to deteriorate or be influenced by these variations as a whole. Asynchronous data issue is handled by the shape-based distance measurement, which is invariant to signal shifting. Furthermore, this method can be applied to different configurations easily since it needs very little reference data.

Clustering is an unsupervised learning algorithm that does not require training with labeled data. Therefore, this method can detect unforeseen anomalies. Unlike the existing cross-similarity approaches


, the proposed method does not need feature extraction. Also, comparing shapes(which is scaling-invariant) instead of values allows it to capture abnormality at the early stage. Therefore, this new method has the potential to give early warnings.

Ii The Anomaly Detection Method

Fig. 2: The Proposed Methodology

Ii-a The K-shape Clustering Algorithm

Proposed by[9], the K-shape clustering algorithm is intended to be used for time-series analysis. It adopts a cross-correlation sequence () to determine signal similarity. The shape-based distance function is given as:


where are the normalized time-series measurements, and is the Rayleigh Quotient. The overall K-shape algorithm has two steps per iteration, repeated until convergence or max iteration is reached. In the first step (assignment), measurements are assigned to each cluster based on their similarity to the centroid. In the second step (update), the clusters’ centroids are updated. The overall time complexity scales linearly with the number of measurements() and the number of clusters () but increase noticeably with the number of time-steps (). K-shape has been applied to time-series analysis and forecast[10, 11, 12], including battery cell voltage monitoring[13].

Ii-B Assumptions and Limitations

Similar to the advantages are given above, the proposed method relies on some assumptions and has its limitations.

  1. The battery pack initially operates normally: We use the original operational data to acquire the reference cluster membership. It is worth mentioning that the proposed method only needs a small size of measurement data to extract the infomation. For example, in Section III, only 4 minutes of normal operation data is used.

  2. The anomaly does not occur in all measurements in the same manner: In theory, anomalies can cause no deviation in the measurement clustering if they affect all measurements simultaneously and in the same way without disrupting the existing cluster memberships. Under such circumstances, the proposed method would not be effective. However, it is unlikely to happen in real-life. A typical anomaly first appears only in one or a few measurements and can be detected by the proposed method.

Ii-C Methodology

Input: membership and distance for the segment, which has clusters ()
Output: anomaly indicator and confidence level for the segments ()
// initialization
// initialization
for  in  do
      if  then
end for
for  in  do
       if  // avoid zero-division
end for
Algorithm 1

As the flowchart depicted in Fig. 2 , the proposed anomaly detection method contains the following stages:

  1. Segmentation/buffering: For offline implementation, existing data is segmented into smaller pieces, each of which will be clustered accordingly. For online implementation, data should be continuously collected and buffered as a segment, which is then processed. Because the K-shape’s time complexity grows noticeably as the number of samples increase, segment size should be limited based on the sampling rate and data type. For example, temperature measurements tend to changes slower than voltage measurements and a thermal fault typically takes a longer time to develop than a voltage anomaly does. Therefore, temperature measurement may need larger segments than voltage measurement.

  2. Cleaning: In this stage, invalid data points, like those that are out of the sensor’s measurement range, are removed. However, there is no need to fill in the missing data due to this method’s advantage.

  3. Preprocess: This stage includes filtering and data normalization. Segments where all signals do not show noticeable dynamics (changes) are excluded from the clustering process.

  4. K-shape: In this stage, the K-shape algorithm given in equation 1 is applied to the segment.

  5. Anomaly Confirmation: Details of the confirmation algorithm are given in Algorithm 1. We look at two criteria. Firstly, if one or multiple measurements changed its membership, it indicates an anomaly(). If there is no change in cluster membership, we check if there is a significant increase in fitting errors/distance(). The following total distance function[14] is used:


    As depicted in Fig 2, during an iteration for the segment, each cluster () is compared to the reference cluster () and the previous cluster (). The first part is for capturing the accumulated changes associated with anomalies that developed gradually. For example, thermal anomaly due to battery’s increased internal resistances. The second part is for capturing incremental changes associated with anomaly that developed abruptly, such as a thermal anomaly caused by short-circuiting.

Iii Experimental results

Iii-a Database and data sources

The R&D team at Gotion has built a cloud-based battery data platform. The data platform receives and cleans battery data from different sources such as fleet vehicles, onboard tests, and lab tests. Afterward, it is uploaded to a time-series database. There are also software modules for data-visualization, battery simulation, and analysis. Most of the algorithms are implemented in Python. The proposed method is applied to battery data collected from fleet vehicle batteries between 2019 and 2021. The data sampling rate is about 0.1 Hz. There are different types of battery chemistry (

and ), pack configurations, and vehicle types.

Fig. 3: Case I: (a) Thermal measurement data with the over-temperature fault. (b) The zoom-in view of fault occurrence, notice a temperature anomaly is detected by the proposed method 90min before it surpasses the threshold. (c) The plot of measurement shapes for the segment in which the anomaly is detected.
field names data type value range source
cell voltage double (0,5) V BMS
temperature double (-50,100) C BMS
pack current double (-500,500) A BMS
battery faults bool [0,1] BMS
battery status enum - BMS
TABLE I: Description of the Vehicle Battery Data

Iii-B Initial results and discussions

In this section, the testing results of two different battery packs are presented and compared to the fault detection based on the on-board BMS. The proposed algorithm is implemented in Python and deployed on a cloud computing platform (AWS). For the following testing results, the segment size is set to 25, equivalent to 4.1 min. Notice that both cases have intermittent and longtime data losses. In the following figures, the data gaps are connected for better visualization.

Iii-B1 Case I

In the first case, the proposed method is applied to a battery pack () that undergoes an over-temperature anomaly. As depicted in Fig 2(a), on Oct 30th, the battery’s temperature near sensor #13 rises significantly to over 70 C. While both the proposed method and BMS are able to flag the anomaly, the new method is more than 90 min earlier. The detailed timing difference between the two methods is illustrated in Fig 2(b). As the figure shows, the BMS reports an over-temperature fault at 3:45 PM when the maximum temperature is over 55 C. On the other hand, the proposed method send an anomaly warning around 2:15 pm, just when sensor #13 start to depart from the rest of the measurements. Fig 2(c) shows a shape plot of the segment where the anomaly is detected. Clearly, one of the signal’s (#13) rising shape stands out from the rest. As signal #13 continues to grow, its shapes for the following segments become less steep. Therefore, the confident level () is the largest at the beginning of the anomaly. This explains why the new method can send early warnings. In conclusion, case I validates that the proposed method can send an early warning for battery over-temperature faults.

Iii-B2 Case II

The second case is from an EV with a two-pack configuration, in which two battery packs () are installed in different locations inside the vehicle. As a result, the thermal measurements behave very differently. As Fig 4 shows, the temperature difference between the two packs grows noticeably in 8:00-9:00 AM and in 1:00-2:00 PM, when a large current is discharged from the battery. In both cases, the onboard BMS reports thermal fault despite that there is no real anomaly. The reason is that the on-board BMS uses hard thresholds from a look-up table. Also depicted in Fig 4, the proposed method successfully recognizes two clustering groups, including and and does not report any anomaly since there is no discrepancy found with the cluster groups. This testing case shows that compared to BMS, the proposed method is more robust to variations caused by the pack design and sensor location.

Fig. 4: Case II: (a) thermal measurements and (b) current for a two-pack battery. Notice the temperature difference grows under large current loads.

Iv Conclusion and Future work

In this paper, we identify common issues in the field of cloud-based battery anomaly/fault detection. Then, a method based on unsupervised shape-clustering is proposed for detecting battery thermal anomalies. The proposed method does not depend on a model or large training data. It also has several unique advantages, such as the resilience to data loss and the capability of early detection. Two test cases based on real vehicle data are studied. In one case, the new method is found to be more accurate than the BMS when applied to a multi-pack vehicle. In the other case study where a battery over-temperature fault occurs, the proposed method is capable of flagging the anomaly at the very early stage, more than 90 min ahead of the BMS.

Future works include the following. Firstly, this method is being currently tested on large data sets. The rate of false positives and false negatives needs to be investigated and compared to the BMS. Secondly, detection accuracy can be further improved by making use of another measurement signals like BMS status and cell voltages.


  • [1] X. Hu, K. Zhang, K. Liu, X. Lin, S. Dey, and S. Onori, “Advanced Fault Diagnosis for Lithium-Ion Battery Systems: A Review of Fault Mechanisms, Fault Features, and Diagnosis Procedures,” IEEE Industrial Electronics Magazine, vol. 14, no. 3, pp. 65–91, 2020.
  • [2] G. L. Plett, Battery Management Systems, Volume I: Battery Modeling, ser. Artech House power engineering series.   Artech House, 2015.
  • [3] X. Lin, H. E. Perez, J. B. Siegel, A. G. Stefanopoulou, Y. Li, R. D. Anderson, Y. Ding, and M. P. Castanier, “Online parameterization of lumped thermal dynamics in cylindrical lithium ion batteries for core temperature estimation and health monitoring,” IEEE Transactions on Control Systems Technology, vol. 21, no. 5, pp. 1745–1755, 2013.
  • [4] W. Gao, Y. Zheng, M. Ouyang, J. Li, X. Lai, and X. Hu, “Micro-short-circuit diagnosis for series-connected lithium-ion battery packs using mean-difference model,” IEEE Transactions on Industrial Electronics, vol. 66, no. 3, 2019.
  • [5] O. Ojo, H. Lang, Y. Kim, X. Hu, B. Mu, and X. Lin, “A Neural Network-Based Method for Thermal Fault Detection in Lithium-Ion Batteries,” IEEE Transactions on Industrial Electronics, vol. 68, no. 5, pp. 4068–4078, 2020.
  • [6] D. Li, Z. Zhang, P. Liu, Z. Wang, and L. Zhang, “Battery Fault Diagnosis for Electric Vehicles Based on Voltage Abnormality by Combining the Long Short-Term Memory Neural Network and the Equivalent Circuit Model,” IEEE Transactions on Power Electronics, vol. 36, no. 2, pp. 1303–1315, 2 2021.
  • [7] A. Nuhic, T. Terzimehic, T. Soczka-Guth, M. Buchholz, and K. Dietmayer, “Health diagnosis and remaining useful life prognostics of lithium-ion batteries using data-driven methods,” Journal of Power Sources, vol. 239, pp. 680–688, 2013. [Online]. Available:
  • [8] M. Schmid, H.-G. Kneidinger, and C. Endisch, “Data-Driven Fault Diagnosis in Battery Systems through Cross-Cell Monitoring,” IEEE Sensors Journal, pp. 1–1, 8 2020.
  • [9] J. Paparrizos and L. Gravano, “K-shape: Efficient and accurate clustering of time series,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, vol. 2015-May.   Association for Computing Machinery, 5 2015, pp. 1855–1870.
  • [10]

    D. Bega, M. Gramaglia, M. Fiore, A. Banchs, and X. Costa-Perez, “DeepCog: Cognitive Network Management in Sliced 5G Networks with Deep Learning,” in

    Proceedings - IEEE INFOCOM, vol. 2019-April, 2019.
  • [11] P. Gianniou, X. Liu, A. Heller, P. S. Nielsen, and C. Rode, “Clustering-based analysis for residential district heating data,” Energy Conversion and Management, vol. 165, pp. 840–850, 6 2018.
  • [12] E. Calikus, S. Nowaczyk, A. Sant’Anna, H. Gadd, and S. Werner, “A data-driven approach for discovering heat load patterns in district heating,” Applied Energy, vol. 252, p. 113409, 10 2019.
  • [13] S. N. Haider, Q. Zhao, and X. Li, “Data driven battery anomaly detection based on shape based clustering for the data centers class,” Journal of Energy Storage, vol. 29, 6 2020.
  • [14] R. Tavenard, J. Faouzi, G. Vandewiele, F. Divo, G. Androz, C. Holtz, M. Payne, R. Yurchak, M. Rußwurm, K. Kolar, and E. Woods, “Tslearn, a machine learning toolkit for time series data,” Journal of Machine Learning Research, vol. 21, 2020.