Adaptive Explainable Continual Learning Framework for Regression Problems with Focus on Power Forecasts

by   Yujiang He, et al.

Compared with traditional deep learning techniques, continual learning enables deep neural networks to learn continually and adaptively. Deep neural networks have to learn new tasks and overcome forgetting the knowledge obtained from the old tasks as the amount of data keeps increasing in applications. In this article, two continual learning scenarios will be proposed to describe the potential challenges in this context. Besides, based on our previous work regarding the CLeaR framework, which is short for continual learning for regression tasks, the work will be further developed to enable models to extend themselves and learn data successively. Research topics are related but not limited to developing continual deep learning algorithms, strategies for non-stationarity detection in data streams, explainable and visualizable artificial intelligence, etc. Moreover, the framework- and algorithm-related hyperparameters should be dynamically updated in applications. Forecasting experiments will be conducted based on power generation and consumption data collected from real-world applications. A series of comprehensive evaluation metrics and visualization tools can help analyze the experimental results. The proposed framework is expected to be generally applied to other constantly changing scenarios.


page 1

page 2

page 3

page 4


CLeaR: An Adaptive Continual Learning Framework for Regression Tasks

Catastrophic forgetting means that a trained neural network model gradua...

Design of Explainability Module with Experts in the Loop for Visualization and Dynamic Adjustment of Continual Learning

Continual learning can enable neural networks to evolve by learning new ...

Don't forget, there is more than forgetting: new metrics for Continual Learning

Continual learning consists of algorithms that learn from a stream of da...

Coresets via Bilevel Optimization for Continual Learning and Streaming

Coresets are small data summaries that are sufficient for model training...

Power Law in Sparsified Deep Neural Networks

The power law has been observed in the degree distributions of many biol...

Continual Learning with Deep Learning Methods in an Application-Oriented Context

Abstract knowledge is deeply grounded in many computer-based application...

A Procedural World Generation Framework for Systematic Evaluation of Continual Learning

Several families of continual learning techniques have been proposed to ...

1 Introduction

Training successful deep neural networks usually depend on a massive amount of data. With one-time design and training, deep neural network models can be easily deployed to solve specific problems. In recent years, lots of research has proven that such a traditional data-driven training method can quickly optimize hyperparameters of deep neural networks with the help of massive datasets and supercomputing resources. Models can obtain or even exceed human-level cognitive skills in lots of application scenarios. However, even such a classic training method has three main disadvantages, too:

  1. A training dataset with sufficient meaningful samples is the prerequisite for training a successful model. Data collection and preprocessing are extremely time-and-money-consuming. Meanwhile, the exhausting process can prolong the preparation phase in a practical project. Sometimes, we have to postpone the beginning of training models due to a lack of data.

  2. In this training setting, it is always assumed that the underlying data generation process is static. Based on this assumption, we can evaluate models’ generalization by comparing the errors among training, validation, and test datasets in the training phase. However, this static model is not always applicable to the constantly evolving and developing world. In this article, the context can be defined as a non-stationary scenario, where the generative probabilistic distribution of input data or the target data changes over time. The change can be grouped into three families, i.e., short periodical, long periodical, or non-periodical [12]. Periodical changes could be due to insufficient samples in the training dataset, which restricts the model from obtaining the information of the entire sample space. Non-periodical changes could be caused by changes in the objective environment, broken physical devices, or unavoidable measurement errors. Deep neural network models should continually learn the data with these periodical changes to improve their cognition, as the learning process of humans. Because the non-periodical changes are hardly predictable and repetitive, we need to detect them and make the correct decisions for processing them.

  3. The structure of the deep neural network model is generally fixed after deployment. This setting is unrealistic and inflexible in real-world applications because new targets can appear as the application environment keeps evolving. The model should extend its structure by increasing the number of outputs in this case. A new target can be a new label in classification tasks or a new predicted object in regression tasks. To take power forecasts in the context of smart power girds as an example, we can train models to provide forecasts regarding energy supply and demand for managing a regional power grid. With the extension of the power grid, new power generators and consumers will absolutely be added to the list of forecasting targets. Training an individual model for the new target might also be a solution. However, in this case, we have to reconsider the first problem, i.e., we can not start training until sufficient samples are collected.

One of the potential solutions for addressing these issues is Continual Learning

(CL), also known as Continuous Learning, Incremental Learning, Sequential Learning or Lifelong Learning, which is carried out to solve multiple, related tasks and leads to a long-term version of machine learning models. While the term is not well consolidated, the idea is to enable models to continually and adaptively learn the world and overcome catastrophic forgetting. Knowledge of models can be developed incrementally and become more complicated. Catastrophic forgetting refers to the phenomenon that models could forget the knowledge for solving old tasks while learning new tasks. This forgetting problem raises a more general problem in traditional neural networks, i.e., the so-called stability-plasticity dilemma, which means that models should find a trade-off between accumulation and integration of new knowledge and retention of old knowledge. Numerous valuable research work focused on CL algorithms, application scenarios, evaluation metrics for classification tasks, etc. However, the necessity of CL for regression tasks seems to be ignored.

This article can be viewed as an abstract of my thesis for a Ph.D. degree. The contributions of the planned thesis mainly contain:

  1. To present the necessity and importance of CL for regression tasks;

  2. To give an overview of the relevant research literature, including but not limited CL algorithms, detection of novelty and non-stationarity in the data stream, explainable artificial intelligence (AI);

  3. To explore the applicability of well-known CL algorithms for regression problems;

  4. To analyze the shortcomings of common experimental setups as well as restrictions of general evaluation metrics;

  5. To summarize relevant research challenges being faced for our proposed CL framework [13] and develop it further;

  6. To develop visualization utilities and propose comprehensive evaluation metrics to make CL explainable;

  7. To evaluate the framework in power forecasting experiments with real-world datasets.

The remainder of this article will start with an overview of the requirements and relevant research questions of CL for regression problems. In Section 2, I will propose a visualizable CL framework for regression problems and introduce the application in the two proposed CL scenarios with instances. Then I will present three experimental datasets, which can be used to design power forecasts experiments to assess the proposed solutions. This article will end up with a brief conclusion.

2 Continual Learning for Regression

2.1 Continual Learning Scenarios

CL has been widely applied to classification problems for learning new tasks sequentially and retaining the obtained knowledge. In [20], three CL scenarios for classification problems are proposed to focus on object recognition:

  • New Instances: new samples of the previously known classes are available in the subsequent batches with new poses or conditions. In other words, these new samples own novel information but still belong to the same labels. Models need to keep extending and accumulating knowledge regarding the learned labels.

  • New Classes: new samples belong to unknown classes. In this case, the model should be able to identify the objects of new classes as retaining the accuracy of old ones.

  • New Instances and Classes: new samples belong to both known and new classes.

Therefore, a new task in the context of classification problems can be defined as learning new instances belonging to known labels or learning to recognize new labels. However, one obvious difference between classification and regression problems is the models’ targets, which are discrete labels in classification and consecutive values in regression.

Two CL scenarios for regression are proposed in my previous work [11]:

  • Data-domain incremental (DDI) scenario

    refers to the situation where the underlying data generation process is changing over time due to the non-stationarity of the data stream. Either the change of the probability distribution regarding the input data

    or the target can trigger updating the model trained on data from the out-of-date generation process. The model learns to extract latent representations of the input data from a changed generation process in the updating phase. Besides, the model needs to adjust its weights to find a new proper mapping between the new latent representations to the targets. The non-stationarity could result from insufficient samples in the pre-training process or external objective factors.

  • Target-domain incremental (TDI) scenario

    refers to the situation where the structure of the network model is extended as the number of prediction targets increases. Assume that using a multi-output deep neural network to forecast several independent targets based on the same input data, the neural network owns a shared hidden sub-network for learning non-linear latent representations and multiple-output sub-networks for prediction. The model will add a new sub-network for the new target when it appears. The TDI scenario is a joint research topic among multi-task learning, transfer learning and continual learning. On the one hand, the obtained knowledge of the shared network can be transferred to train the additional sub-network quickly, even without sufficient samples. On the other hand, CL algorithms can avoid decreasing the prediction accuracy of the previously handled tasks while learning the new task by utilizing the free weights of the shared network, which are unimportant for other targets.

For example, renewable power generation can be predicted based on regional weather conditions. Figure  1 illustrates the two proposed CL scenarios to power forecasts of regional renewable energy generators. As described above, the model has a shared network to learn the latent space and several prediction sub-networks for these targets.

Figure 1: An illustration of a neural network applied to learn new tasks in both CL application scenarios. Red pots and lines indicate the new samples and new sub-networks. The cases, marked as 1.a and 1.b, correspond to the DDI scenario, where statistic property of input data and mapping function with given changes, respectively. The right case, marked as 2, corresponds to the TDI scenario where new targets are added to the prediction list.

The weather conditions are time-variant features, which fluctuate periodically over time. Therefore, a gradual change can exist in the weather data due to climate change, dynamic behavior of noise from the weather prediction model, or other foreseeable factors. Such smooth change is usually referred to as concept drift [10]. The case in Fig. 1, marked as 1.a, presents this challenge, which will lead to a negative impact on prediction accuracy, especially when sufficient samples are unavailable for pre-training.

The cases of 1.b are corresponding to power generation capability, which is time-dependent and can be affected by, such as, upgrading or aging of the device in the long term or the changes in the environment. Besides, residential power demand forecasting is another example that needs to be considered in this scenario. Generally, we predict the overall power demand of a residential area in a low-voltage power grid rather than the power demand of every single consumer in this region. The mapping is sensitive to the change of these consumers’ power demand or consumption habits. Sometimes we have to update the prediction model due to these unpredictable factors. In the data-domain CL scenario, models should continually collect data and accumulate knowledge by learning newly collected data. Regarding the TDI scenario in Fig. 1, the prediction sub-network for an additional photovoltaic generator is added to the prediction model.

In the proposed setting, a new task can thus be defined as (1) learning non-stationarity of the data stream, including the input data generation and the output data generation with given inputs, or (2) integrating new sub-networks to the existing model for predicting new targets without any negative effects on the prediction accuracy of other known targets. The red spots and lines in Fig. 1 correspond to the new tasks in both scenarios, respectively.

2.2 Research Questions

In the common CL experimental setting for classification tasks, the dataset contains disjoint subsets, each of which is assumed to be separately sampled from an independent and identical distributions. One subset represents a task, and the dataset

is not independently and identically distributed, which is different from traditional supervised learning. Neural network models need to learn these unseen, independent tasks sequentially as the identification information of these tasks is given. Some CL algorithms allow models to revisit the previously learned tasks without restriction while learning a new one. It is called replay CL strategy, which will be introduced in the remainder of this article. The replayed data can be a subset of samples storing the previous tasks in raw format or generated by a generative model. This setting can not represent the case in the real world, though it is feasible to evaluate and compare diverse CL algorithms.

First, the appearance of a new task is usually unpredictable in real-world applications, which means that the prior knowledge of the new task is unavailable. A detection mechanism should be set to identify the appearance and the type of the new tasks. Second, data streams in real applications are infinite and contain both known and unknown tasks, which might not appear separately and orderly. The model could identify the new task and update itself immediately or store the samples until it is sufficient for an update, which depends on the adopted updating strategy. Third, unrestricted retraining on old tasks enables the model to remember the obtained knowledge and might be prone to overfitting. Besides, storing all old tasks might be a burden to storage overhead and violate privacy laws.

Farquhar et al. introduce five core desiderata for evaluating CL algorithms and designing classification experiments [5]. In previous work [13], we give five suggestions for designing CL regression experiments:

  1. New tasks resemble the previous tasks;

  2. The neural network model utilizes the single output for predicting the corresponding target and learning the changes in DDI scenario;

  3. New tasks appear unpredictably in DDI scenario, and the prior knowledge regarding the appearance should be identified rather than informed;

  4. Considering privacy law and revisiting the previous dataset with restriction;

  5. More than two tasks, either in DDI scenario or TDI scenario.

These suggestions will guide the development of updating methods and the design of experiments. Furthermore, the following research questions should be answered in the proposed thesis.

2.2.1 Question 1: When to trigger an update?

The trigger condition is the prerequisite for CL and determines the starting point of an update. According to the definition of a new task in Section 2.1

, this question is more valuable to research in the DDI scenario, where new tasks appear unpredictably, because new tasks of the TDI scenario depend on the objective requirements of projects and are added manually. For answering this question, my research will focus on novelty detection (concept shift and drift) using deterministic and probabilistic methods. For example, in 


, the trigger condition depends on the number of newly collected novel samples. Besides, updating could also be triggered due to the estimation of new samples’ entropy. The design of update trigger conditions is the first significant step affecting the updating results and the model’s future performance.

2.2.2 Question 2: How to update models?

Updating methods are the core for learning tasks sequentially and continually. CL algorithms can be briefly categorized into three groups [3] depending on how data is stored and reused:

  • Regularization-based approaches

    : The goal of these algorithms is to reduce storage demand and prioritize privacy. Instead of revisiting old data, a penalty regularization term is added to the loss function for consolidating the weights which are important for previous tasks while learning a new task. Delange et al. further divided these approaches into data-focused approaches 

    [19, 31, 25] and prior-focused approaches [16, 30, 1].

  • Replay approaches: These approaches prevent forgetting through replaying the previous samples stored in raw format or generated by a generative model, which can be named as rehearsal approaches [26, 2] and pseudo rehearsal approaches [28, 18], respectively. The subset of these previous samples can be introduced as the model’s inputs combined with new samples for continually learning the new task while constraining optimization in the loss function.

  • Parameter isolation approaches: The concept of these approaches is to arrange a subset of the model’s parameters to a new task specifically. For example, one can adjust the model’s structure by expanding a new branch to learn a new task if no constraint is required for the model’s size [27]. Alternatively, the shared part of the network for all tasks can stay static, and previous task parts are marked out while learning new tasks [21, 6].

Some previous works solve the forgetting problem using Bayesian Neural Networks, such as in [17, 22, 23], which can also be grouped into one of the above families.

Note that not all well-known CL can directly be applied to regression tasks. For example, Li et al. use Knowledge Distillation loss [14] in their Learning Without Forgetting [19] (LWF) algorithm to consolidate the obtained knowledge for previous tasks. The loss function is a variant of cross-entropy loss, which is inapplicable for regression tasks. Thus, I plan to review these well-known algorithms and then analyze their advantages and applicability. Moreover, further work proposes novel CL algorithms based on the current CL and the proposed experimental setup. Another interesting topic is ensemble CL, which investigates the collaboration of various CL algorithms to improve models’ performance.

2.2.3 Question 3: How to evaluate the updated models?

Common evaluation metrics for regression tasks, such as Mean Square Error (MSE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), can assess the fitting and prediction ability of the model. However, more specific metrics are required for evaluating the updated models comprehensively in the CL setting. In [12, 13], the models are evaluated in terms of fitting error, prediction error, and forgetting ratio. We also consider training time as a significant evaluation factor [11], especially in real-time applications. Besides, algorithm ranking metrics are proposed according to different desiderata in [4], including accuracy, forward/backward transfer, model size efficiency, sample storage size efficiency, computational efficiency. Díaz-Rodríguez et al. fuse these desiderata to a single CL score for ranking purposes. A series of wide-ranging evaluation metrics can make CL explainable, which is the basis of visualizing the updating process and dynamically adjusting hyperparameters.

2.2.4 Question 4: How to explain the updating process?

The training process of either typical supervised learning or CL is generally a black box, where results of the solution are untransparent and incomprehensible for humans. Explainable artificial intelligence, also called XAI in literature, refers to the technique used to help humans understand and trust machine learning models’ results. It has been applied to many sectors, such as medical diagnoses [15] and text analysis [24].

Due to stochasticity in re-learning, some updates could fail and lead to a worse predictive ability. XAI can visualize the updating process and interpret the reasons for the failures. Experts can monitor the updating process and analyze the updated model based on the given evaluation criterion. Furthermore, they can more easily decide to accept successful updates or reject failed failures, and take the following actions, for example, rolling back the failed model to the previous version, assembling multiple updated models for ensemble learning, or adjusting hyperparameters dynamically for a further update.

3 Visualizable Continual Learning Framework for Regression Tasks

The V-CLeaR framework, which is short for Visualizable Continual Learning for Regression tasks, is shown in Fig. 2

. It consists of three main parts: (1) preprocessing block, (2) CLeaR framework, and (3) explainability utility. The preprocessing block is responsible for processing the incoming data, including cleaning the exceptions, filling miss values, and scaling. Besides, due to concept drift in the data stream, the parameters of the used scaler might have to be updated, for example, the maximum and minimum of the min-max scaler or the mean and variance of the standard scaler.

Figure 2: An illustration of the visualizable continual learning frameowrk for regression tasks. This framework consists of three parts: (1) preprocessing block, (2) CLeaR framework, and (3) explainability utility. The dashed line means that the Truth Y is optional.

The CLeaR framework is a continual learning framework based on buffered data [13] that is grouped as novelty and familiarity by the deterministic or probabilistic novelty detector and stored in the corresponding buffer. The novelty from the infinite data stream indicates what the trained model cannot predict accurately and should continually learn. The familiarity is defined as data that the model is familiar with. It could be obtained from the infinite data stream or the stored historical samples in a raw format, or generated by a generative model. Storage and usage of these buffered data are dependent on the adopted CL strategies.

Figures 3 and 4 illustrate examples of a CLeaR instance in both proposed CL scenarios.

Figure 3: An illustration of the CLeaR instance in the DDI scenario. Threshold_a, Buffer_a, Threshold_p_1 and Buffer_p_1

are the framework-related hyperparameters for the autoencoder and the predictor 1, respectively. All sub-networks that need to be updated in application own a series of framework-related components and hyperparameters.

Figure 4: An illustration of the CLeaR instance in the TDI scenario. It is similar to the CLeaR instance in the DDI scenario. Note that each predictor that needs to be updated in applications should own an independent series of framework-related components and hyperparameters.

In this instance, the model consists of an autoencoder as the shared network for extracting the latent representations of the input and fully connected networks as the predictors. Meanwhile, the two sub-networks are used for detecting the changes in the input distribution and the goal distribution , respectively. Either novelty or familiarity is determined by comparing the MSE between prediction (or reconstruction ) and the truth (or ) to the preset dynamically changeable threshold. The novelty buffer has a limited size, and the familiarity buffer is infinite. Updating the sub-network is triggered when the corresponding novelty buffer is filled. After updating the sub-network, the corresponding threshold will be adjusted depending on the updating results, and its buffers will be emptied. The core of this framework is the flexibility and customizability of these modules, including the novelty detector, storage of the data, the available CL strategies, and the type of neural network models. Users can select the optimal components of the framework for their own applications.

The explainability utility is designed as a visualization tool focusing on visualizing the updating process and explaining the updated model using the proposed evaluation metrics. The updating process is also supervised by experts, who can input instructions to assist the model in making decisions for the next move. Here a decision is defined as what will affect the CLeaR instance to take the following actions. For example, the CL-algorithm-related hyperparameters are adjusted for re-updating when the current updating results are not ideal, or the framework-related hyperparameters will be changed to make a trade-off between forgetting and prediction in the future update. Besides, considering these factors, such as storage and computational overhead, experts can decide to store or drop the updated models.

The development of the V-CLeaR framework can answer the four research questions listed in Section 2.2.

4 Datasets & Experiments

In the proposed thesis, I plan three experiments in the context of power forecasts based on three real-world public datasets to assess the framework’s performance. In the remainder of this section, I will briefly introduce the selected datasets and the experimental setup.

4.1 Wind Power Generation Forecasts

The EuropeWindFarm dataset [9] contains the day-ahead power generation of 45 wind farms (off- and onshore) scattered over the European continent, as shown in Fig. 5.

Figure 5: The locations of the European wind farms.

The dataset contains hourly averaged wind power generation time series for two consecutive years and the corresponding day-ahead meteorological forecasts provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) weather model. The meteorological features contain (1) wind speed in 100m height, (2) wind speed in 10m height, (3) wind direction (zonal) in 100m height, (4) wind direction (meridional) in 100m height, (5) air pressure, (6) air temperature, and (7) humidity. All features are scaled between 0 and 1. Additionally, the power generation time series is normalized with the wind farm’s respective nominal capacity to enable a scale-free comparison and to mask the original characteristics of the wind farm. The dataset is pre-filtered to discard any period of time longer than 24 hours in which no energy has been produced, as this is an indicator of a wind farm malfunction.

V-CLeaR instances can be built with the weather features as the input to predict the wind power generation at the corresponding time points [13, 8]. One model is trained for each prediction target, i.e., the wind power generator. This experiment can simulate the DDI scenario.

4.2 Solar Power Generation Forecasts

The GermanSolarFarm dataset [7, 8] contains 21 photovoltaic (PV) facilities in Germany, as shown in Fig. 6.

Figure 6: The locations of the German solar farms.

Their installed nominal power ranges between 100kW and 8500kW. The PV facilities range from PV panels installed on rooftops to fully-fledged solar farms. Historical numeric weather prediction (NWP) data and the produced power in a three-hour resolution for 990 days are available for each facility. The weather prediction series in the dataset are scaled between 0 and 1 using the min-max normalization. Besides, there are three temporal features, the hour of the day, the month of the year, and the season of the year, which are normalized in the range of 0 and 1 using cosine and sine coding. The target variable, i.e., the measured power generation, is normalized using the nominal output capacity of the corresponding PV facility. Therefore, it allows the comparison of the forecasting performance without taking the size of the PV facilities into account.

The experimental setup is the same as the setup of the wind power generation forecasts experiment. Namely, the NWP features are used as the input of V-CLeaR instances for forecasting each generator in the DDI scenario.

4.3 Power Supply and Demand Forecasts in a Regional Power Grid

The regional power grid dataset [12] is collected from a real-world German regional flexibility market, including two-year NWP data, the low-/medium-voltage power generation (e.g., wind and solar power generation) and consumption (e.g., residential and industrial consumption) measurements in the same period, and the geographic and electrical information of the power grid as shown in Fig. 7.

Figure 7: An illustration of the regional power grid. The green block represents the electrical substation of the city and the blue circles represent the regional power consumers and generators, i.e., the prediction targets.

The NWP data contains 13 24-hour-ahead numerical weather features with a 15-minute resolution. The power data contains historical samples of 11 renewable power generators, 55 local energy consumers, and 36 low-voltage residential consumers. Such as the NWP data, the power data ranges from March 1, 2019, to March 31, 2021, with a 15-minute resolution. NWP and power data are scaled between 0 and 1 using min-max normalization.

The power grid information records all information regarding the regional energy market, such as the energy generators and consumers’ parameters, the topological structure of the power grid, and the connection points to higher or lower power grid levels, etc. It can help create a virtual power grid using the open-source python library, pandapower 

[29], to analyze the power grid’s state and optimize power supply and demand.

Because all generators are located in the same region, the NWP features are viewed as the identical input for predicting all power targets in the power grid. Therefore, we can build a multiple-output neural network with a shared sub-network that extracts public representations, as shown in Fig. 1, to assess the V-CLeaR framework in the DDI and TDI scenarios. Additionally, the virtual power grid can help analyze the effect of continually updated prediction models on power grid management and optimization.

5 Conclusion

Conclusively, as a starting point of the dissertation, this proposal aims to present the existing research questions related to continual deep learning for regression tasks. Based on these questions and requirements from real-world applications, this article proposes an explainable neural-network-based CL framework for solving data-domain or target-domain incremental regression tasks. Currently, the work is in the progress of developing the modules of this framework and evaluating the functionality of each module in the application scenario of power forecasts. The V-CLeaR framework is expected to be modularized, and users can utilize the single module or customize the framework for their requirements. Our previous works have proven the applicability and necessity of the proposed framework.

6 Acknowledgment

This work was supervised by Prof. Dr. Bernhard Sick and supported within the Digital-Twin-Solar (03EI6024E) project, funded by BMWi: Deutsches Bundesministerium für Wirtschaft und Energie/German Federal Ministry for Economic Affairs and Energy.


  • [1] A. Chaudhry, P. K. Dokania, T. Ajanthan, and P. H. Torr (2018) Riemannian walk for incremental learning: understanding forgetting and intransigence. In

    Proceedings of the European Conference on Computer Vision (ECCV)

    pp. 532–547. Cited by: 1st item.
  • [2] M. De Lange and T. Tuytelaars (2020) Continual prototype evolution: learning online from non-stationary data streams. arXiv preprint arXiv:2009.00919. Cited by: 2nd item.
  • [3] M. Delange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars (2021) A continual learning survey: defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1. External Links: Document Cited by: §2.2.2.
  • [4] N. Díaz-Rodríguez, V. Lomonaco, D. Filliat, and D. Maltoni (2018) Don’t forget, there is more than forgetting: new metrics for continual learning. arXiv preprint arXiv:1810.13166. Cited by: §2.2.3.
  • [5] S. Farquhar and Y. Gal (2018) Towards robust evaluations of continual learning. arXiv preprint arXiv:1805.09733. Cited by: §2.2.
  • [6] C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A. Rusu, A. Pritzel, and D. Wierstra (2017) Pathnet: evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734. Cited by: 3rd item.
  • [7] A. Gensler, J. Henze, N. Raabe, and V. Pankraz (2016) GermanSolarFarm Data Set. External Links: Link Cited by: §4.2.
  • [8] A. Gensler, J. Henze, B. Sick, and N. Raabe (2016) Deep learning for solar power forecasting—an approach using autoencoder and lstm neural networks. In 2016 IEEE international conference on systems, man, and cybernetics (SMC), pp. 002858–002865. Cited by: §4.1, §4.2.
  • [9] A. Gensler (2016) EuropeWindFarm Data Set. External Links: Link Cited by: §4.1.
  • [10] C. Gruhl, B. Sick, and S. Tomforde (2021) Novelty detection in continuously changing environments. Future Generation Computer Systems 114, pp. 138–154. Cited by: §2.1.
  • [11] Y. He, J. Henze, and B. Sick (2020) Continuous learning of deep neural networks to improve forecasts for regional energy markets. IFAC-PapersOnLine 53 (2), pp. 12175–12182. Cited by: §2.1, §2.2.3.
  • [12] Y. He, Z. Huang, and B. Sick (2021) Toward application of continuous power forecasts in a regional flexibility market. Note: In press Cited by: item 2, §2.2.3, §4.3.
  • [13] Y. He and B. Sick (2021) CLeaR: an adaptive continual learning framework for regression tasks. arXiv preprint arXiv:2101.00926. Cited by: item 5, §2.2.1, §2.2.3, §2.2, §3, §4.1.
  • [14] G. Hinton, O. Vinyals, and J. Dean (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. Cited by: §2.2.2.
  • [15] A. Holzinger, C. Biemann, C. S. Pattichis, and D. B. Kell (2017) What do we need to build explainable ai systems for the medical domain?. arXiv preprint arXiv:1712.09923. Cited by: §2.2.4.
  • [16] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. (2017) Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114 (13), pp. 3521–3526. Cited by: 1st item.
  • [17] R. Kurle, B. Cseke, A. Klushyn, P. van der Smagt, and S. Günnemann (2020) Continual learning with bayesian neural networks for non-stationary data. In International Conference on Learning Representations, External Links: Link Cited by: §2.2.2.
  • [18] F. Lavda, J. Ramapuram, M. Gregorova, and A. Kalousis (2018) Continual classification learning using generative models. arXiv preprint arXiv:1810.10612. Cited by: 2nd item.
  • [19] Z. Li and D. Hoiem (2017) Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence 40 (12), pp. 2935–2947. Cited by: 1st item, §2.2.2.
  • [20] V. Lomonaco and D. Maltoni (2017) Core50: a new dataset and benchmark for continuous object recognition. In Conference on Robot Learning, pp. 17–26. Cited by: §2.1.
  • [21] A. Mallya and S. Lazebnik (2018) Packnet: adding multiple tasks to a single network by iterative pruning. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 7765–7773. Cited by: 3rd item.
  • [22] T. P. Minka, R. Xiang, and Y. Qi (2009)

    Virtual vector machine for bayesian online classification

    In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 411–418. Cited by: §2.2.2.
  • [23] C. V. Nguyen, Y. Li, T. D. Bui, and R. E. Turner (2018) Variational continual learning. In International Conference on Learning Representations, Cited by: §2.2.2.
  • [24] M. A. Qureshi and D. Greene (2019) Eve: explainable vector based embedding technique using wikipedia. Journal of Intelligent Information Systems 53 (1), pp. 137–165. Cited by: §2.2.4.
  • [25] A. Rannen, R. Aljundi, M. B. Blaschko, and T. Tuytelaars (2017) Encoder based lifelong learning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1320–1328. Cited by: 1st item.
  • [26] S. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert (2017)

    Icarl: incremental classifier and representation learning

    In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2001–2010. Cited by: 2nd item.
  • [27] A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell (2016) Progressive neural networks. arXiv preprint arXiv:1606.04671. Cited by: 3rd item.
  • [28] H. Shin, J. K. Lee, J. Kim, and J. Kim (2017) Continual learning with deep generative replay. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2994–3003. Cited by: 2nd item.
  • [29] L. Thurner, A. Scheidler, F. Schäfer, J. Menke, J. Dollichon, F. Meier, S. Meinecke, and M. Braun (2018) Pandapower—an open-source python tool for convenient modeling, analysis, and optimization of electric power systems. IEEE Transactions on Power Systems 33 (6), pp. 6510–6521. Cited by: §4.3.
  • [30] F. Zenke, B. Poole, and S. Ganguli (2017) Continual learning through synaptic intelligence. In International Conference on Machine Learning, pp. 3987–3995. Cited by: 1st item.
  • [31] J. Zhang, J. Zhang, S. Ghosh, D. Li, S. Tasci, L. Heck, H. Zhang, and C. J. Kuo (2020) Class-incremental learning via deep model consolidation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1131–1140. Cited by: 1st item.