Gaussian Process Regression for In-situ Capacity Estimation of Lithium-ion Batteries

by   Robert R. Richardson, et al.
University of Oxford

Accurate on-board capacity estimation is of critical importance in lithium-ion battery applications. Battery charging/discharging often occurs under a constant current load, and hence voltage vs. time measurements under this condition may be accessible in practice. This paper presents a data-driven diagnostic technique, Gaussian Process regression for In-situ Capacity Estimation (GP-ICE), which estimates battery capacity using voltage measurements over short periods of galvanostatic operation. Unlike previous works, GP-ICE does not rely on interpreting the voltage-time data as Incremental Capacity (IC) or Differential Voltage (DV) curves. This overcomes the need to differentiate the voltage-time data (a process which amplifies measurement noise), and the requirement that the range of voltage measurements encompasses the peaks in the IC/DV curves. GP-ICE is applied to two datasets, consisting of 8 and 20 cells respectively. In each case, within certain voltage ranges, as little as 10 seconds of galvanostatic operation enables capacity estimates with approximately 2-3



There are no comments yet.


page 1

page 2

page 3

page 4


Gaussian process regression for forecasting battery state of health

Accurately predicting the future capacity and remaining useful life of b...

Modified Gaussian Process Regression Models for Cyclic Capacity Prediction of Lithium-ion Batteries

This paper presents the development of machine learning-enabled data-dri...

Latent Function Decomposition for Forecasting Li-ion Battery Cells Capacity: A Multi-Output Convolved Gaussian Process Approach

A latent function decomposition method is proposed for forecasting the c...

Machine learning pipeline for battery state of health estimation

Lithium-ion batteries are ubiquitous in modern day applications ranging ...

Battery health prediction under generalized conditions using a Gaussian process transition model

Accurately predicting the future health of batteries is necessary to ens...

Dense 3-D Mapping with Spatial Correlation via Gaussian Filtering

Constructing an occupancy representation of the environment is a fundame...

Mathematical Modeling and Analysis of ZigBee Node Battery Characteristics and Operation

ZigBee network technology has been used widely in different commercial, ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Lithium-ion batteries experience capacity fade during use through a complex interplay of physical and chemical processes [1, 2]. Knowledge of the present battery capacity is necessary to ensure reliable operation and facilitate corrective action when appropriate. Battery capacity estimates are also an essential input for optimal battery sizing algorithms, for applications such as microgrids [3] and hybrid energy storage systems [4]. Therefore, accurate online capacity estimation is an important function of the battery management system.

There are several different approaches to capacity estimation [5, 6]. The most common of these involve parameter estimation of battery equivalent circuit models [7, 8, 9, 10] or electrochemical models [11, 12, 13, 14]. These approaches have been successfully applied in many studies; however, they all require the provision of an accurate battery model. Moreover, for high fidelity models, parameter identifiability can be a major challenge [15].

Incremental capacity (IC) and differential voltage (DV) analysis have also been used for capacity estimation. These techniques have conventionally been used for detailed cell analyses, such as understanding degradation mechanisms [16, 17], however recent studies have considered the use of portions of the IC/DV curve for online capacity estimation [18, 19, 20, 21, 22]. In particular, Berecibar et al. [21]

demonstrated cell capacity estimation using a selection of features of IC/DV curves as inputs. They demonstrated their approach using three different regression techniques: Linear Regression, Multilayer Perceptrons and Support Vector Machines (SVM), with the latter two methods showing best results. Although their approach showed good performance, the use of features derived from IC/DV curves as inputs to a regression problem has a number of drawbacks. Firstly, differentiating the voltage-time data amplifies the noise in the measurement, even when sophisticated smoothing algorithms are employed. In particular, the magnitude of the peaks were found to be especially sensitive to noise. Hence, this induces a loss of accuracy in the subsequent regression problem since the inputs are derived from the differentiated data. Secondly, since the inputs are the values and locations of the peaks, the voltage range must encompass the voltages at which these peaks occur. In some cases, one of these peaks may be located at a high State of Charge (SoC) and another at a low SoC, and hence to identify all the inputs would require covering a large voltage range, and a long measurement duration. Lastly, the selection of the features is a cumbersome pre-processing step, since these are likely to vary between cells of different chemistries.

The present work overcomes these issues by dispensing with the interpretation of the voltage data as IC or DV curves and instead operating directly on the voltage vs. time data itself. This is achieved by first smoothing the voltage curve using a Savitzky-Golay (SG) filter111Savitztky-Golay filtering is often used when differentiating noisy data; differentiation is not our objective here, however we nonetheless use this filter since it reduces measurement noise, which is advantageous in any case. [23], and then using the time values at equispaced voltages as the inputs to the regression problem. Full details of this procedure, which we term Gaussian Process regression for In-situ Capacity Estimation (GP-ICE), are given in Section II. Furthermore, GP-ICE uses Gaussian processes (GPs) [24]

rather than SVMs or neural networks for the regression step. GPs have previously been used in relatively few studies on battery diagnostics/prognostics 

[25, 26, 27], however, they possess a variety of desirable attributes. Firstly, they are non-parametric222Support vector machines are, like GPs, non-parametric, but they do not provide confidence estimates in their predictions.

, and hence permit a model expressivity (i.e. a number of parameters) that is naturally calibrated to the requirements of the data. Secondly, GPs are a Bayesian method, and hence handle uncertainty in a principled manner. An important aspect of diagnostics is not only estimating the capacity values but also expressing the uncertainty associated with these estimates. Bayesian methods provide a principled approach to dealing with uncertainty, giving rise to credible intervals with probabilistic upper and lower bounds, which are essential for making informed decisions.

The remainder of this article is organised as follows. Section II describes the novel capacity estimation algorithm, whilst the details of Gaussian process regression are provided in the appendix. Section III gives details of the two datasets used for validation, and Section IV presents and analyse the results of our method applied to these datasets. Section V discusses the practical applicability of the method, and elaborates on its advantages and disadvantages.

Ii Method

Ii-a Overview

An overview of the general methodology is given below. The process is also depicted in Fig. 1, and a detailed flow diagram is included in Fig. 10 at the end of this document. For simplicity, the following description assumes that charging (rather than discharging) data are used, although the procedure is equally applicable in either case.


Assume we have a database of cells. Each cell has been cycled to varying states of health and this cycling may have occurred under varying conditions (e.g. with different C-rates, DoDs, and temperatures). At various stages throughout the life of each cell, a full constant-current charge cycle has been applied at a fixed pre-specified current and a fixed pre-specified ambient temperature, and the voltage vs. time data from this cycle are recorded. From here on we refer to this data as a Galvanostatic Voltage (GV) curve. The GV curve is smoothed using a Savitzky-Golay (SG) filter (or any other simple, efficient smoothing algorithm), and the - data at 1 s intervals are acquired; a subset of these points will be used as the input data for a single sample. Since a full charge/discharge cycle is applied, the capacity of the cell at this C-rate is known and given by . We denote this known capacity as since it will be the target value for this GV curve in the regression step. Note that each cell can have a different number of GV curves, and the order of these curves is not important. Hence, the end result is just a labelled set of training data, consisting of a large set of smoothed GV curves (a subset of which will form the inputs), and an associated set of known cell capacities (the outputs). The total number of GV curves across all cells is the sample size, of the database.


The procedure for estimating the cell capacity using a short online diagnostic test is described next. Assume we have a cell with an unknown capacity and unknown SoC, and we wish to estimate the capacity.

Fig. 1: Overview of GP-ICE method. The time values at equispaced voltage points between and are used as inputs to the regression model. The test inputs are shown as red crosses, and the training inputs are shown as blue crosses. The training inputs all have an associated known capacity. The figure shows just three GV curves for training, but in practice the model is trained on several hundred GV curves, obtained from multiple cells at different states of health (see Table I).
  1. Allow the cell to rest for a sufficient period to minimise electrical/thermal effects from the previous cycle. In practice the minimum duration of this rest will depend on the cell chemistry and the nature of the previous cycle.

  2. Apply the pre-specified constant current for some duration, , and measure the voltage throughout. In practice would be dictated by the duration of time one can afford to take, or the duration of time a device happens to be charged for by the user. The voltage range of this test will span from some lower voltage, , which is the cell voltage when the charge is first applied, to some higher voltage, , which is the cell voltage at the instant the constant current is removed.

  3. Smooth this voltage vs time data using an SG (or similar) smoothing filter, as before.

  4. Identify the values of the time at equispaced voltage points between and , and denote these values by the vector . For example, is chosen, and the voltage spanned from  V to  V, then would consist of the time values at , i.e. . We will later use as the independent variable in the regression model333Intuitively, the inverse of this procedure (i.e. using the voltages sampled at uniformly spaced times as the inputs) might seem to be more logical. However, the former approach is chosen here because using a fixed voltage range prevents the voltage from entering regions where there is no training data. For instance, if a large is used in the test case, it might happen that this extends beyond the upper voltage region of the GV curve for a training case with smaller capacity. For example, in Fig. 1, the test case (leftmost subplot) could include up to s (if the entire voltage range was used), whereas this is clearly longer than any possible measurement on the second training case (second subplot from right)., as shown in Fig. 1.

  5. For each of the GV curves in the offline database, identify the corresponding input vectors, , given by the time taken to go from the lower voltage to each of the equispaced voltages, i.e. . Since the cell capacities for each GV curve in the offline database are known, each time vector, , has an associated capacity, which we denote .

  6. Hence, for each GV curve in the training set, there is an input vector and an output scalar, . These are used as the inputs and outputs to a GP regression model for predicting the capacity, as described next.

Ii-B Regression

The goal of a regression problem is to learn the mapping from inputs to outputs , given a labelled training set of input-output pairs , where is the number of training examples. In the present case, the inputs are the time vectors for each GV curve, and the outputs are the corresponding measured capacities, as discussed in the previous section. The underlying model is assumed to take the form , where represents a latent function and is an independent and identically distributed noise contribution. The learned model can then be used to make predictions at a test index (the vector of time values obtained online) for the unknown capacity, .

In the present work, Gaussian process regression with a Matérn (5/2) kernel function (see Appendix) is used to achieve this mapping. A full description of the mathematical machinery behind GPs is given in the appendix. The method was implemented in Matlab using the GPML toolbox [28].

A leave-one-out validation scheme was used, whereby each cell is used once as a test set while the data from the remaining cells form the training set. The performance was evaluated using the root-mean-squared percentage error (RMSPE) in the capacity estimation, defined as


where is the estimated capacity, is the true value, and is the total number of test points. Because percentage errors are normalised, they can be used to compare forecast performance across datasets with different absolute cell capacities, as is the case in this study [29].

To quantify the reliability of the uncertainty estimates, we use the calibration score (CS), defined as the frequency of actual results lying within a given credibility interval. For instance, for a credibility interval, the calibration score is defined as:


For a Gaussian predictive distribution, the interval corresponding to is a 95.4% credibility interval. Hence, the frequency of actual results lying in these intervals should be approximately 0.954: greater or less than this implies that the model is under- or over- confident respectively.

Iii Datasets

Two different datasets are considered in this work: (i) the Oxford dataset, consisting of our own in-house aging experiments and (ii) the NASA dataset, obtained from an open-access repository provided by the NASA Ames Research Centre. An overview of each dataset is given in Table I.

Dataset Oxford NASA
Manufacturer Kokam LG Chem.
Form factor Pouch 18650
# cells 8 20
# samples 519 842
range (Ah) 0.74 0.43 2.10 0.80
Cycling All cells cycled with same regime 5 groups each with different regime
TABLE I: Dataset overview. The ‘# samples’ column indicates the total number of voltage-time curves, across all cells. The ‘ range’ column indicates the values of the maximum initial capacity and minimum final capacity respectively, across all cells.

Iii-a Oxford

The Oxford data was obtained from the Oxford Battery Degradation Dataset444 [30]. This consists of aging experiments applied to 8 commercial Kokam pouch cells of 740 mAh nominal capacity, with graphite negative electrode and lithium cobalt oxide (LCO)/lithium nickel cobalt oxide (NCO) positive electrode. Cycling was conducted using a Biologic MPG 205 potentiostat, and the cells were housed in a Binder MK53 thermal chamber at a constant ambient temperature of 40 °C.

All 8 cells were cycled by repeatedly discharging using the ARTEMIS urban drive cycle [31] and recharging at a constant current of 2C. After every 100 cycles, a characterisation test was carried out including a full charge-discharge cycle at 1C – these were the GV curves for this dataset. Fig. 2b shows the complete set of GV curves for Cell 1 over its entire lifecycle. Similar sets of curves were observed for the other cells. Each of these curves represents a single sample from which the inputs to the regression problem are sampled, as discussed in Section II. A total of 519 charge curves were measured across all cells (i.e.  curves per cell).

The cell capacity was calculated by integrating the 1C charge curves. The calculated capacities for all 8 cells are plotted as a function of cycle number in Fig. 2a. The end of life (EoL) was deemed to occur if the cell terminal voltage dropped below 0 V during the discharge cycle. The EoL typically occurred at 8,000 cycles (Fig. 2a) although one of the cells failed much earlier than this (5,000 cycles). Another cell (light green line in Fig. 2a) entered a change of regime around 5,000 cycles where a sudden drop in capacity occurred – this provides an interesting challenge for the capacity estimation algorithm as discussed in Section IV.

Fig. 2: Oxford dataset. a, Capacity evolution of the tested cells. b, Evolution of the voltage curves for Cell 1 over the life of the cell. The colours range from dark to light as the cycle number increases.

Iii-B Nasa

The NASA dataset was obtained from the NASA Ames Prognostics Center of Excellence Randomized Battery Usage Repository [32]. The data in this repository was first used in Ref. [14] for an investigation into capacity fade under randomized load profiles. The data are randomised in order to better represent practical battery usage. The tests were conducted with LG Chem. 18650 Li-cobalt cells with 2.1 Ah nominal capacity. The remainder of this subsection describes the cycling and characterisation procedure based on the documentation provided with the downloaded datasets [32].

Group 1 (Cells 1, 2, 7, 8)
Repeatedly charged to 4.2V using a randomly selected duration between 0.5 hours and 3 hours, and then discharged to 3.2V using a randomized sequence of discharging currents between 0.5A and 4A. Reference characterisation carried out every 50 cycles.
Group 2 (Cells 3-6)
Same as group 1 except charging cycle is not randomized.
Group 3 (Cells 9-12)
Operated using a sequence of charging/discharging currents between -4.5A and 4.5A. Each loading period lasted 5 minutes. Reference characterisation carried out after 1500 periods (about 5 days).
Group 4 (Cells 13-16)

Repeatedly charged to 4.2V and then discharged to 3.2V using a randomized sequence of discharging currents between 0.5A and 5A. A customized probability distribution designed to be skewed towards selecting higher currents was used to select a new load setpoint every 1 minute during discharging operation.

Group 5 (Cells 17-20)
Same as group 4 except the probability distribution was designed to be skewed towards selecting lower currents.
TABLE II: NASA data load profiles. Each group of cells underwent a different loading procedure. Full details of these procedures are described in the repository documentation [32]
Fig. 3: NASA dataset. a, Capacity evolution of the 5 groups of tested cells. Each group consists of four cells cycled with similar profiles. b, Evolution of the voltage curves for an exemplary cell from each group. The colours range from dark to light red as the cycle number increases.

For this study we used the data from the first 20 cells in the repository, which were all cycled at room temperature throughout the duration of the experiments. The cells are grouped into 5 groups of 4, with each group undergoing a different randomized loading procedure as described in Table II. In all cases a characterisation test was periodically carried out, whereby a 2A charge-discharge cycle was applied – the discharge curves were used as the GV curves in this case, to demonstrate the applicability of our method using either charge or discharge data. A total of 842 GV curves were measured across all cells (i.e.  curves per cell).

The cell capacity was calculated by integrating the 2A charge curves. The calculated capacities for the cells in all 5 groups are plotted against the cycle count in Fig. 3a. The full set of GV curves for a selected cell from each group is plotted in Fig. 3b, beneath the corresponding capacity plots. Fig. 3 shows that the evolution of the capacity is quite different for each group of cells. Later results demonstrate that the GP-ICE method is robust in that it provides accurate estimates in spite of this path dependence of the capacity fade.

Iv Results

Iv-a Oxford dataset

Fig. 5 shows results for selected cells from the Oxford dataset for two combinations of online measurement duration, , and lower voltage, . For each plot, the model is tested on the cell shown and trained on all other cells. Note that for the test set, we do not actually carry out a separate online diagnostic test as described in Section II; rather the relevant portion of the data was simply selected from the full GV curve, as though it had come from a short diagnostic test. Fig. 5a shows that reasonable performance can be achieved using a relatively short measurement duration of just 50 s. Where the predictions are less accurate, the error bars are quite honest and generally extend to encompass the true values. For instance, Cell 2 exhibits an unusual drop in capacity at 5000 cycles, a behaviour which is not manifested by any of the other cells (which were used for training in this case). Hence, the estimates made for Cell 2 after 5000 cycles are slightly erratic, but their uncertainty is accurately reflected by their correspondingly larger error-bars. On the other hand, Fig. 5b shows that consistently high performance can be achieved if a large is used. The estimates for all cells in this case have an RMSPE value below 1%. Interestingly, the method performs well for Cell 2 even in the regime beyond

5000 cycles, and expresses high confidence in these estimates. In practice the provision of such confidence estimates has significant implications. For instance, in an online setting, as capacity measurements are received sequentially from diagnostic tests of varying duration, a Kalman filter 

[33] (or other probabilistic filter) could effectively discount the uncertain measurements and retain the certain ones. This a more robust diagnosis over multiple cycles.

Fig. 5 shows the overall results, where each cell is used once as the test set. Fig. 5a shows actual vs. predicted capacities across all cells for a selection of and values. It is apparent that larger values (lower rows on the grid of plots in Fig. 5a) have higher accuracy, whereas differences in (columns of the same grid) have a less consistent effect on the RMSPE. This is shown explicitly in Figs. 5b and c, which show the overall RMSPE values plotted against and respectively. For all starting voltages there is a clear decreasing trend in RMSPE as is increased, as would be expected.

For the measurement duration of s (the bottom row of the grid of plots in Fig. 5a), the capacity is accurately estimated even at extreme values. For instance, the lone data-point at just under 0.5 Ah lies very close to the red line despite not having other nearby training examples from which to learn. One of the advantages of Bayesian methods such as GPs over deterministic methods is that they can generalise better from relatively small datasets such as the one used here by properly expressing their uncertainty about the underlying model.

On the other hand, when smaller

values are used (such as the middle and upper rows of plots) this outlier is over-estimated. However, in most cases where the estimates are inaccurate, the error bars are correspondingly larger, hence accurately conveying the model’s uncertainty (as indicated by the grey error bars generally crossing the red line in Fig. 

5a). To evaluate the accuracy of the uncertainty estimates we calculate the calibration score (Eq. 2) for two different intervals: CS and CS, corresponding to 50% and 95.4% credibility intervals, respectively. The average calibration scores for the model across all combinations of and are CS and CS; the CS values for specific combinations of and are also quoted within the subplots in Fig. 5a. These values are slightly less than the corresponding true credibility intervals, which indicates that the model is slightly over-confident in its estimates. This is most likely due to the fact that the model assumes that the inputs are uncorrelated, when in fact they come from a GV curve with sequential structure. However, these uncertainties are still quite reasonable, especially in comparison to non-probabilistic approaches (such as the previously used neural networks or SVMs [21]), which implicitly assign equal credibility to all estimates.

Fig. 4: Selected results for the Oxford dataset. The red lines indicate the measured capacity and the black markers with errorbars indicate the GP-ICE estimates . a, Using a test duration of s and starting voltage of , b, Using a test duration of s and starting voltage of .
Fig. 5: Overall results for the Oxford dataset. RMSPE values are based on the entire dataset with each cell used once as the test set. a, Actual vs. predicted capacities for different starting voltages and measurement durations. The red line represents . The closer the datapoints lie to this line, the smaller the difference between the actual and predicted value. The grey lines indicate credibility intervals for each datapoint. The quoted CS values indicate the associated calibration score; the closer these scores are to 0.954 the more accurate the uncertainty estimates. b, RMSPE vs. measurement duration for different starting voltages. c, RMSPE vs. starting voltage for different measurement durations. The RMSPE clearly decreases with measurement duration but shows relatively little dependence on the starting voltage.
Fig. 4: Selected results for the Oxford dataset. The red lines indicate the measured capacity and the black markers with errorbars indicate the GP-ICE estimates . a, Using a test duration of s and starting voltage of , b, Using a test duration of s and starting voltage of .
Fig. 6: Selected results for the NASA dataset. The red lines indicate the measured capacity and the black markers with errorbars indicate the GP-ICE estimates . a, Using a test duration of s and starting voltage of , b, Using a test duration of s and starting voltage of .
Fig. 7: Overall results for the NASA dataset. RMSPE values are based on the entire dataset with each cell used once as the test set. a, Actual vs. predicted capacities for different starting voltages and measurement durations. The red line represents . The closer the datapoints lie to this line, the smaller the difference between the actual and predicted value. The grey lines indicate credibility intervals for each datapoint. The quoted CS values indicate the associated calibration score; the closer these scores are to 0.954 the more accurate the uncertainty estimates. b, RMSPE vs. measurement duration for different starting voltages. c, RMSPE vs. starting voltage for different measurement durations. The RMSPE generally decreases with measurement duration, but notably is also strongly affected by the starting voltage.
Fig. 6: Selected results for the NASA dataset. The red lines indicate the measured capacity and the black markers with errorbars indicate the GP-ICE estimates . a, Using a test duration of s and starting voltage of , b, Using a test duration of s and starting voltage of .

Iv-B NASA dataset

Figs. 7 and 7 show selected and overall results respectively for the NASA dataset, analogous to Figs. 5 and 5 from the previous section. The NASA dataset presents a greater challenge for capacity estimation since it includes cells used in 5 different cycling regimes. Moreover, even within each group the cells are not cycled with identical load profiles, but rather with statistically similar profiles generated by the same probabilistic algorithm, as discussed in Section III-B. Hence the GV curves used for training are more likely to differ from those used for testing than in the Oxford dataset. Nonetheless, the method performs respectably, although in general with less accuracy than for the Oxford dataset.

Fig. 7 shows results for selected cells for two combinations of and

. In this case, the capacity estimates are in general less accurate than before, and the confidence intervals larger. However, the confidence intervals do accurately reflect the model uncertainty and hence the error-bars encompass the true values in most cases. Again, Fig. 

7b shows that surprisingly accurate estimates can be obtained with a relatively short measurement - in this case, a measurement of just 10 s duration gives accuracies of . However, this relies on using an appropriate lower voltage – in this case V. Indeed, the most striking aspect of these results is the strong dependence on the starting voltage, as discussed next.

Fig. 7 shows the overall results for this dataset. As in the previous case, increased measurement duration is shown to generally improve the capacity estimate (Fig. 7b). The average calibration scores are also reasonable: CS and CS. These are very slightly less than the true intervals, 0.5 and 0.954, indicating that the model is only slightly over-confident in its estimates. In contrast to the previous case, the model performance is strongly dependent on the lower voltage, as shown in Fig. 7c (these differences in behaviour are probably attributable to the different cell chemistries of these two datasets). This figure shows that there is a cliff in the RMSPE vs.  curve at around 3.5 V. For starting voltages above this value, very good performance is achieved regardless of the measurement duration. This indicates that voltages in the higher range are more informative than those in the lower range for these cells. Such insights have obvious implications for informing battery management systems on strategies for online capacity estimation.

Iv-C Comparison with IC/DV

Lastly, GP-ICE is compared with an approach based on incremental capacity (IC) and differential voltage (DV) peak tracking. For the latter approach, which we denote IC+DV, the location and magnitude of the largest peak in both the IC and DV curves were identified and used as inputs to the regression step. This results in 4 inputs (i.e. 2 inputs from each curve). For the regression step the same GP model was used as for the GP-ICE method, and so any differences in performance are due to the differences in the quality of the input data (i.e. smoothed voltage data for GP-ICE vs. peak values of differentiated voltage data for IC+DV). Since the total number of inputs is the same as that used in the GP-ICE method, the computational requirements are identical in each case. This IC+DV approach is similar to that used in [21] except that in that case a neural network/SVM was used for the regression step (also, in that work, various combinations of peak features were considered, not just the most prominent peaks). For the GP-ICE models, 6 different combinations of and were selected, and numbered as shown in Table III.

1 2 3 4 5 6
(s) 10 450 1,450 10 450 1,450
(V) 3.5 3.5 3.5 3.7 3.7 3.7
TABLE III: GP-ICE model denotations for 6 combinations of and

The results are shown in Fig. 8 and Table IV. Fig. 8 is a boxplot showing the spread in performance across all the tested cells, where the red lines indicate the median cell RMSPE. Table IV shows the overall RMSPE value when evaluated across all cells. Bold numbers in this table indicate the best performing model for each dataset.

It is clear from these results that an appropriately selected GP-ICE test outperforms the IC+DV approach. For the Oxford dataset, the IC+DV test is outperformed by either a 450 s GP-ICE test at V or a 1,450 s GP-ICE test at either value of . In the best case ( s, V), GP-ICE achieves an RMSPE of 0.49% compared to 1.11% for IC+DV, a reduction by a factor of 2.26. For the NASA dataset, IC+DV is outperformed by a test of any duration (as little as 10 s) provided the starting voltage is sufficiently high V. In the best case ( s, V), GP-ICE achieves an RMSPE of 2.48% compared to 6.55% for IC+DV, a reduction by a factor of 2.64.

In other cases, GP-ICE performs worse than IC+DV, most notably for lower in the NASA dataset and for shorter values in the Oxford dataset. However, it is worth reiterating that the IC+DV approach relies on coverage of a large voltage range to capture the peaks in both the IC and DV curves, and hence these measurements could require a large and variable duration. For example, in the NASA dataset, a full GV curve takes up to 2 hrs, and so even if the peaks were separated by half this time, it would require a 1 hr test to capture both peaks. Such a test would encompass the voltage ranges of several of the better performing GP-ICE tests. Lastly, for the GP-ICE method exactly equispaced time samples were used as input regardless of the duration of the GV curve considered, however it is possible that the performance could be improved by increasing this number. We tested this hypothesis with a sensitivity analysis w.r.t.  for different values of and (Fig. 9). For s, there was negligible improvement in performance beyond inputs for either dataset. For the Oxford dataset, minor improvements were observed up until when s. Hence, some additional information could be extracted from the longer duration GV curves by increasing beyond 4. For the NASA dataset, there was little improvement beyond even for the longer measurement duration; this is most likely due to the lower charge rate (C/2) used for the NASA cells, meaning that even a s test encompasses a relatively small voltage range.

Fig. 8: Boxplots of overall model performance showing the spread in RMSPE values across all the tested cells for a, Oxford dataset, b, NASA dataset
- 1 2 3 4 5 6
Oxford 1.11 2.10 1.10 0.74 6.55 2.10 0.49
NASA 6.55 21.95 13.91 8.14 3.31 3.12 2.48
TABLE IV: Overall model performance in RMSPE. The values quoted are based on the entire datasets with each cell used once as the test set. For each dataset the RMSPE of the best performing model is shown in bold.
Fig. 9: Sensitivity of model accuracy to input dimensionality for different values of and . The blue, red and yellow lines indicate , and V respectively. a, Oxford dataset: convergence by datapoints for s and by datapoints for s, b, NASA dataset: convergence by datapoints for both values.

V Discussion

This section briefly discusses issues related to the selection of inputs for the GP-ICE algorithm, the physical processes contributing to the observed correlations, the applicability of the approach in a practical setting. Lastly, it compares the GP-ICE approach to related work.

V-a Selection of model inputs

Firstly, we discuss how the particular inputs to the GP-ICE algorithm – namely time values at equispaced voltages – were selected. As mentioned in Section I, this choice was originally motivated by the observation that correlations existed between capacity and selected features of IC and DV curves in earlier works [18, 19, 20, 21]. It was therefore natural to ask whether the capacity is also correlated with other portions of the curve, which do not necessarily correspond to such IC/DV peaks. The particular choice of inputs used in GP-ICE has a number of desirable characteristics. Firstly, by taking values spanning to , the method places no restrictions on what range of voltages must be encompassed in the online test, whilst at the same time taking full advantage of whatever range it happens to include. Secondly, equispaced measurements are expected to give the best reflection of the overall curve, for a given value of . Of course, it is possible that other design choices may improve on this performance. In fact, the problem of estimating capacity from voltage curves could well be framed within the context of functional data analysis (FDA) [34], which is the study of information on curves or functions. In that case the GV curve would be treated as a functional input, and the processes of smoothing and regression would implicitly be achieved in a single principled step. An interesting area of future work would be to compare the performance of FDA against the present approach.

V-B Physical explanation

Li-ion cells undergo three primary modes of degradation: loss of lithium inventory (LLI), loss of active positive electrode material (LAM) and loss of active negative electrode material (LAM) [17]. These modes have observable effects on the IC/DV curves (and by extension the voltage-time data), and hence can be exploited by the GP-ICE method to infer cell capacity. Whilst elucidating the physical processes that give rise to capacity loss is an important area of study, this has been considered by several other works (e.g. [17, 35]) and is therefore not the primary concern of the present paper. Rather, this work aims to highlight that raw voltage measurements can be used to infer the capacity without necessarily knowing the exact mechanisms through which this occurs. This is in fact core to the advantage of GP-ICE: since it does not rely on cell specific knowledge such as the expected locations and numbers of peaks in IC/DV curves, it could be directly applied to other cell chemistries without modification. Of course, there is no guarantee of equivalent results to those obtained here – the performance is dependent on how strongly the galvanostatic voltage-time data are correlated with the cell capacity, something which may vary from cell to cell and across voltage ranges, as the earlier results show. However, the important point is that there is no need to encode any cell-specific information in our model – the capacity estimation is achieved automatically in any case. This generality also opens up the possibility of applying the method to portions of constant-current data within otherwise dynamic drive cycles. This is likely to be non-trivial due to dynamics in the cell; however, if long enough portions of constant current are available, then it may give satisfactory results.

V-C Practical application

There are many practical scenarios in which GP-ICE could be applicable. For instance, in EV applications, the vast majority of charging stations output a power of less than kW [36], which would equate to  C for a typical EV battery pack. Nonetheless, the effect of C-rate on performance could be considered in future work to establish whether the method would be feasible using higher power charging/discharging. It is probable that higher pre-specified C-rates may result in lower performance – since higher C-rates result in some of the subtler features of the OCV curve being “smoothed out” by the cell impedance – but it is not clear to what extent this would be the case. Another important consideration is the application of the technique under variable ambient temperature conditions. The present results apply to a single temperature for each dataset; however, variations in temperature can result in significant changes to the measured impedance and OCV [37] and so accounting for this variation will be essential for the method to be applied in different ambient temperature conditions. This could be achieved provided appropriate training data are available encompassing the relevant range of temperatures. We emphasise that this would not necessarily require a large increase in experimental effort: for instance, to include additional temperatures it is merely necessary to repeat the reference charge/discharge measurement step under each of the required temperatures for each cell. The most time intensive portion of the test – namely ageing the cells by repeated operation under various drive cycles – could remain unchanged. Also, it should be noted that these limitations apply equally to a number of other approaches to capacity estimation, including Incremental Capacity and Differential Voltage analysis.

V-D Related work

Lastly, we briefly compare the present approach with other recent studies related to feature extraction of online measurements for battery SOH estimation. We consider here only the most relevant studies; the reader is referred to the review studies

[5, 6, 38] for details of other approaches.

You et al. [39]

presented an approach which uses a Recurrent Neural Network trained on partial charge curves for estimating cell capacity. This is similar to our approach but with some key differences. Specifically, our GP-ICE approach: (i) employs a Gaussian process method for the regression step, which provides confidence in the capacity estimates, (ii) uses Savitzky-Golay filtering as a preprocessing step to improve signal to noise ratio, (iii) selects a subset of the smoothed data in order to minimise computational overhead – this is a necessary requirement given the higher computational overhead of GPs compared to neural networks. Moreover, our method shows how the performance of the capacity estimates varies as a function of the starting voltage and measurement duration, something which has not been demonstrated in previous work. On the other hand, the method of

[39] exploits the sequential nature of the charge curves, unlike our approach, which ignores any correlation between the inputs. An interesting area of future work could involve accounting for correlations between the inputs by encoding recurrent behaviour into the kernel of the GP function (such as in the method presented in [40]) in order to achieve the benefits of both of these approaches.

Differential Thermal Voltammetry (DTV) is another approach to capacity estimation that has been introduced very recently [41, 42, 43]

. DTV tracks battery degradation through phase transitions, and the resulting entropic heat, occurring in the electrodes, by means of temperature vs. time measurements under relatively high current loads. In some respects, DTV is similar to Differential Voltage Analysis but using temperature, rather than voltage, measurements. The key advantage over Differential Voltage Analysis is that DTV is applicable using higher currents and hence enables shorter diagnostic tests. DTV could in fact be

complementary to the GP-ICE approach presented here: e.g. GP-ICE could be applied using measurements of temperature rather than voltage, combining the advantages of both approaches.

Vi Conclusions

This paper has introduced GP-ICE, a technique for estimating battery capacity using small portions of voltage-time data under constant current (galvanostatic) operation. The primary novel aspects of our approach are as follows:

  1. Operates on raw voltage data: GP-ICE dispenses with the interpretation of galvanostatic voltage (GV) data as incremental capacity or differential voltage curves, and instead involves directly performing regression using the voltage/time data as inputs.

  2. Automatic input extraction: To enable automatic identification of inputs for a new cell, GP-ICE uses a two-step process of (i) smoothing the voltage data and (ii) sampling voltages from the smoothed data to obtain the inputs to the regression model.

  3. Bayesian non-parametric regression: GP-ICE utilises a probabilistic paradigm, unlike previous works. It therefore adapts to the complexity of the data and avoids over-fitting, whilst also providing accurate estimates of uncertainty in its predictions

Features (1) and (2) above have a number of benefits, including mitigating the inaccuracy introduced by differentiating the voltage-time data, enabling capacity estimates using arbitrary portions of the voltage curve, and overcoming the need for cumbersome analysis of the voltage-time data for a new cell to identify the features of interest. Feature (3) is also important: through the use of a Bayesian non-parametric regression technique, Gaussian processes regression, the model adapts to the complexity of the data and avoids over-fitting.

Concretely, GP-ICE was shown to outperform IC/DV peak tracking by a factor 2.5 in terms of RMSPE, whilst also providing the various aforementioned advantages such as greater flexibility, shorter diagnostic test requirements, and the provision of accurate estimates of uncertainty in its predictions. It also provides insight into which voltage ranges are most informative, and hence may inform a BMS as to when best to perform a diagnostic test.

Future work should consider accounting for variable ambient temperatures and/or higher pre-specified C-rates – this should be feasible provided training data under the relevant temperatures/C-rates are acquired during each reference charge/discharge step during the ageing experiments.

[Gaussian process regression]

A Gaussian process (GP) [24] defines a probability distribution over functions, and is denoted as:


where and are the mean and covariance functions respectively, denoted by


For any finite collection of input points, say , this process defines a probability distribution that is jointly Gaussian, with some mean and covariance given by .

Gaussian process regression is a way to achieve non-parametric regression with Gaussian processes. The key idea is that, rather than postulating a parametric form for the function and estimating the parameters (as in parametric regression), we instead assume that the function is a sample from a Gaussian process as defined above.

In this work, we use the Matérn covariance function:


with smoothness hyperparameter,

(larger implies smoother functions) and is the modified Bessel function. This kernel was chosen as it suitable for functions with varying degrees of smoothness, although similar performance was observed using other common kernels, including the Squared Exponential [24]. The mean function is commonly defined as , and for convenience this convention is followed here.

Now, if we observe a labelled training set of input-output pairs , predictions can be made at test indices by computing the conditional distribution . This can be obtained analytically by the standard rules for conditioning Gaussians [44]

, and (assuming a zero mean for notational simplicity) results in a Gaussian distribution given by:




The values of the covariance hyperparameters may be optimised by minimising the negative log marginal likelihood defined as

. Minimising the NLML automatically performs a trade-off between bias and variance, and hence ameliorates over-fitting to the data. Given an expression for the NLML and its derivative w.r.t

(both of which can be obtained in closed form), the value of can be estimated using any standard gradient-based optimizer. In the present case, the GPML toolbox [28] implementation of conjugate gradients was used.

Appendix A Supplementary material

Fig. 10: GP-ICE flow diagram. Note that the data used in these plots was generated for illustration purposes. See Section II.A for further details.


This work was funded by an RCUK Engineering and Physical Sciences Research Council grant, ref. EP/K002252/1.


  • [1] M. Broussely, P. Biensan, F. Bonhomme, P. Blanchard, S. Herreyre, K. Nechev, and R. Staniewicz, “Main aging mechanisms in Li ion batteries,” Journal of power sources, vol. 146, no. 1, pp. 90–96, 2005.
  • [2] S.-K. Jung, H. Gwon, J. Hong, K.-Y. Park, D.-H. Seo, H. Kim, J. Hyun, W. Yang, and K. Kang, “Understanding the degradation mechanisms of LiNi0. 5Co0. 2Mn0. 3O2 cathode material in lithium ion batteries,” Advanced Energy Materials, vol. 4, no. 1, 2014.
  • [3] H. Khorramdel, J. Aghaei, B. Khorramdel, and P. Siano, “Optimal battery sizing in microgrids using probabilistic unit commitment,” IEEE Transactions on Industrial Informatics, vol. 12, no. 2, pp. 834–843, 2016.
  • [4] J. Shen, S. Dusmez, and A. Khaligh, “Optimization of sizing and battery cycle life in battery/ultracapacitor hybrid energy storage systems for electric vehicle applications,” IEEE Transactions on Industrial Informatics, vol. 10, no. 4, pp. 2112–2121, 2014.
  • [5] A. Farmann, W. Waag, A. Marongiu, and D. U. Sauer, “Critical review of on-board capacity estimation techniques for lithium-ion batteries in electric and hybrid electric vehicles,” Journal of Power Sources, vol. 281, pp. 114–130, 2015.
  • [6] J. Zhang and J. Lee, “A review on prognostics and health monitoring of li-ion battery,” Journal of Power Sources, vol. 196, no. 15, pp. 6007–6014, 2011.
  • [7] G. L. Plett, “Extended Kalman filtering for battery management systems of LiPB-based HEV battery packs: Part 3. State and parameter estimation,” Journal of Power sources, vol. 134, no. 2, pp. 277–292, 2004.
  • [8] ——, “Sigma-point kalman filtering for battery management systems of lipb-based HEV battery packs: Part 2: Simultaneous state and parameter estimation,” Journal of power sources, vol. 161, no. 2, pp. 1369–1384, 2006.
  • [9] W. Waag and D. U. Sauer, “Adaptive estimation of the electromotive force of the lithium-ion battery after current interruption for an accurate state-of-charge and capacity determination,” Applied Energy, vol. 111, pp. 416–427, 2013.
  • [10] X. Hu, R. Xiong, and B. Egardt, “Model-based dynamic power assessment of lithium-ion batteries considering different operating conditions,” IEEE Transactions on Industrial Informatics, vol. 10, no. 3, pp. 1948–1959, 2014.
  • [11] N. A. Chaturvedi, R. Klein, J. Christensen, J. Ahmed, and A. Kojic, “Algorithms for advanced battery-management systems,” IEEE Control Systems, vol. 30, no. 3, pp. 49–68, 2010.
  • [12] S. J. Moura, N. A. Chaturvedi, and M. Krstic, “PDE estimation techniques for advanced battery management systems,” in American Control Conference (ACC), 2012.   IEEE, 2012, pp. 566–571.
  • [13] G. K. Prasad and C. D. Rahn, “Model based identification of aging parameters in lithium ion batteries,” Journal of power sources, vol. 232, pp. 79–85, 2013.
  • [14] B. Bole, C. S. Kulkarni, and M. Daigle, “Adaptation of an electrochemistry-based li-ion battery model to account for deterioration observed under randomized use,” in Proceedings of Annual Conference of the Prognostics and Health Management Society, Fort Worth, TX, USA, vol. 29, 2014, pp. 1–9.
  • [15] A. M. Bizeray, J.-H. Kim, S. R. Duncan, and D. A. Howey, “Identifiability and parameter estimation of the single particle lithium-ion battery model,” arXiv preprint arXiv:1702.02471, 2017.
  • [16] M. Dubarry, C. Truchot, and B. Y. Liaw, “Synthesize battery degradation modes via a diagnostic and prognostic model,” Journal of power sources, vol. 219, pp. 204–216, 2012.
  • [17] C. R. Birkl, M. R. Roberts, E. McTurk, P. G. Bruce, and D. A. Howey, “Degradation diagnostics for lithium ion cells,” Journal of Power Sources, vol. 341, pp. 373–386, 2017.
  • [18] C. Weng, Y. Cui, J. Sun, and H. Peng, “On-board state of health monitoring of lithium-ion batteries using incremental capacity analysis with support vector regression,” Journal of Power Sources, vol. 235, pp. 36–44, 2013.
  • [19] C. Weng, X. Feng, J. Sun, and H. Peng, “State-of-health monitoring of lithium-ion battery modules and packs via incremental capacity peak tracking,” Applied Energy, vol. 180, pp. 360–368, 2016.
  • [20] M. Berecibar, M. Garmendia, I. Gandiaga, J. Crego, and I. Villarreal, “State of health estimation algorithm of LiFePO 4 battery packs based on differential voltage curves for battery management system application,” Energy, vol. 103, pp. 784–796, 2016.
  • [21] M. Berecibar, F. Devriendt, M. Dubarry, I. Villarreal, N. Omar, W. Verbeke, and J. Van Mierlo, “Online state of health estimation on NMC cells based on predictive analytics,” Journal of Power Sources, vol. 320, pp. 239–250, 2016.
  • [22] M. Berecibar, M. Dubarry, I. Villarreal, N. Omar, and J. Van Mierlo, “Degradation mechanisms detection for hp and he nmc cells based on incremental capacity curves,” in Vehicle Power and Propulsion Conference (VPPC), 2016 IEEE.   IEEE, 2016, pp. 1–5.
  • [23] A. Savitzky and M. J. Golay, “Smoothing and differentiation of data by simplified least squares procedures.” Analytical chemistry, vol. 36, no. 8, pp. 1627–1639, 1964.
  • [24] C. E. Rasmussen and C. K. I. Williams,

    Gaussian processes for machine learning

    .   Citeseer, 2006.
  • [25] B. Saha, K. Goebel, S. Poll, and J. Christophersen, “Prognostics methods for battery health monitoring using a bayesian framework,” IEEE Transactions on instrumentation and measurement, vol. 58, no. 2, pp. 291–296, 2009.
  • [26] D. Liu, J. Pang, J. Zhou, Y. Peng, and M. Pecht, “Prognostics for state of health estimation of lithium-ion batteries based on combination gaussian process functional regression,” Microelectronics Reliability, vol. 53, no. 6, pp. 832–839, 2013.
  • [27] R. R. Richardson, M. A. Osborne, and D. A. Howey, “Gaussian process regression for forecasting battery state of health,” Journal of Power Sources, vol. 357, pp. 209–219, 2017.
  • [28] C. E. Rasmussen and H. Nickisch, “Gaussian processes for machine learning (GPML) toolbox,” J. Mach. Learn. Res., vol. 11, pp. 3011–3015, Dec. 2010.
  • [29] D. A. Swanson, J. Tayman, and T. Bryan, “Mape-r: a rescaled measure of accuracy for cross-sectional subnational population forecasts,” Journal of Population Research, vol. 28, no. 2-3, pp. 225–243, 2011.
  • [30] C. Birkl, “Oxford battery degradation dataset 1,” 2017.
  • [31] M. André, M. Keller, Å. Sjödin, M. Gadrat, I. Mc Crae, and P. Dilara, “The ARTEMIS European tools for estimating the transport pollutant emissions,” in Proc. 18th International Emission Inventories Conference, 2009, pp. 1–10.
  • [32] B. Bole, C. Kulkarni, and M. Daigle, “Randomized battery usage data set,” NASA AMES prognostics data repository, 2014.
  • [33] M. S. Grewal, Kalman filtering.   Springer, 2011.
  • [34] J. O. Ramsay, Functional data analysis.   Wiley Online Library, 2006.
  • [35] M. Lewerenz, A. Marongiu, A. Warnecke, and D. U. Sauer, “Differential voltage analysis as a tool for analyzing inhomogeneous aging: A case study for lifepo 4— graphite cylindrical cells,” Journal of Power Sources, vol. 368, pp. 57–67, 2017.
  • [36] D. Sbordone, I. Bertini, B. Di Pietra, M. C. Falvo, A. Genovese, and L. Martirano, “EV fast charging stations and energy storage technologies: A real implementation in the smart micro grid paradigm,” Electric Power Systems Research, vol. 120, pp. 96–108, 2015.
  • [37] R. R. Richardson and D. A. Howey, “Sensorless battery internal temperature estimation using a kalman filter with impedance measurement,” IEEE Transactions on Sustainable Energy, vol. 6, no. 4, pp. 1190–1199, 2015.
  • [38] L. Ungurean, G. Cârstoiu, M. V. Micea, and V. Groza, “Battery state of health estimation: a structured review of models, methods and commercial devices,” International Journal of Energy Research, vol. 41, no. 2, pp. 151–181, 2017.
  • [39] G.-w. You, S. Park, and D. Oh, “Diagnosis of electric vehicle batteries using recurrent neural networks,” IEEE Transactions on Industrial Electronics, 2017.
  • [40] M. Al-Shedivat, A. G. Wilson, Y. Saatchi, Z. Hu, and E. P. Xing, “Learning scalable deep kernels with recurrent structure,” arXiv preprint arXiv:1610.08936, 2016.
  • [41] B. Wu, V. Yufit, Y. Merla, R. F. Martinez-Botas, N. P. Brandon, and G. J. Offer, “Differential thermal voltammetry for tracking of degradation in lithium-ion batteries,” Journal of Power Sources, vol. 273, pp. 495–501, 2015.
  • [42] Y. Merla, B. Wu, V. Yufit, N. P. Brandon, R. F. Martinez-Botas, and G. J. Offer, “Novel application of differential thermal voltammetry as an in-depth state-of-health diagnosis method for lithium-ion batteries,” Journal of Power Sources, vol. 307, pp. 308–319, 2016.
  • [43] T. Shibagaki, Y. Merla, and G. J. Offer, “Tracking degradation in lithium iron phosphate batteries using differential thermal voltammetry,” Journal of Power Sources, vol. 374, pp. 188–195, 2018.
  • [44] K. P. Murphy, Machine learning: a probabilistic perspective.   MIT press, 2012.