Log In Sign Up

A Benchmark dataset for predictive maintenance

The paper describes the MetroPT data set, an outcome of a eXplainable Predictive Maintenance (XPM) project with an urban metro public transportation service in Porto, Portugal. The data was collected in 2022 that aimed to evaluate machine learning methods for online anomaly detection and failure prediction. By capturing several analogic sensor signals (pressure, temperature, current consumption), digital signals (control signals, discrete signals), and GPS information (latitude, longitude, and speed), we provide a dataset that can be easily used to evaluate online machine learning methods. This dataset contains some interesting characteristics and can be a good benchmark for predictive maintenance models.


page 1

page 2

page 3

page 4


Neuroscience-Inspired Algorithms for the Predictive Maintenance of Manufacturing Systems

If machine failures can be detected preemptively, then maintenance and r...

Data Strategies for Fleetwide Predictive Maintenance

For predictive maintenance, we examine one of the largest public dataset...

A Large-Scale Annotated Multivariate Time Series Aviation Maintenance Dataset from the NGAFID

This paper presents the largest publicly available, non-simulated, fleet...

Predictive Maintenance for General Aviation Using Convolutional Transformers

Predictive maintenance systems have the potential to significantly reduc...

Software Engineering Solutions To Support Vertical Transportation

In this paper we introduce the core results of the project on visualisat...

Predicting pigging operations in oil pipelines

This paper presents an innovative machine learning methodology that leve...

Background & Summary

The occurrence of faults in public transport vehicles during their regular operation is a source of numerous damages, especially when they cause the interruption of the trip. The negative impacts affect not only the operator company but the clients, who are thereby disappointed with their expectations of transportation trust. In this context, the early detection of such faults can avoid the cancellation of trips and the withdrawal of service from the respective vehicle and thus is of enormous value. Only in 2017, more than 170 trips were canceled for this reason.

The Air Production Unit (APU) installed on the roof of Metro vehicles feeds units that perform different functions. One of these units is the secondary suspension, responsible for maintaining the height of the vehicle level regardless of the onboard number of passengers. The APU is a highly requested element on the vehicle throughout the day. The absence of redundancy causes its failure to result in the immediate removal of the train for repair. The failures typically are undetectable according to traditional maintenance criteria (predefined thresholds).

From the operational point of view, the objective of Predictive Maintenance is to reduce operational problems, reduce the number of unforeseen stops and the stopping time, and change the maintenance paradigm: from reactive to predictive.

The main goal of the MetroPT dataset is to become a benchmark dataset for Predictive Maintenance. That is a real-world dataset, where the ground truth of anomalies is known from the company’s maintenance reports. It will allow fair comparisons between Machine Learning algorithms developed to detect anomalies based on sensor data collected as a continuous data flow.


Figure 1: Air Producing Unit (APU)

In the last few years, many works have been published about Predictive Maintenance (PdM) with the development of machine and deep learning techniques. A recent survey in Predictive Maintenance was published by

[4]. It covers the main issues in data-driven PdM. Another survey describing advances using machine learning and deep learning techniques for handling PdM in the railway industry was published by [2]. A recent manuscript by [5] identifies three key research lines for the PdM domain: failure prediction, remaining useful life (RUL), and root cause analyses (RCA) as some exciting topics that will attract the focus of researchers. In fact the final goal of PdM consists of elaborating a maintenance plan when a failure is identified. Identifying the components involved in the failure and the severity of the failure are relevant informations for the recovering plan.

Regarding the present dataset, two recent works try to solve the failure prediction. In the first, presented by [1]

, the authors constructed a rule-based system to produce some alerts about the state of the compressor. The second work, presented by

[3], explores the usage of deep learning auto encoders to produce alerts. In both cases, the results are satisfactory, but there is a vast space to improve accuracy and explanation.

Data Records

The signal acquisition system installed in one APUs of vehicle (APU01) collects data from eight analogue sensors (see Figure 2) (pressure, temperature and electric current consumed, placed in different components of the APU - Figure 1) as well as eight digital signals collected directly from the APU (see Figure 3) and GPS information (see Figure 4.

The data acquisition rate is 1Hz, and the information is sent to the remote server every 5 minutes using the GSM network. The data collection of the two units began on 12 March 2020 and is continuously operational to date. Every day, for each APU, a report is generated with the information of the sensor signals.

Figure 2: Analog Sensors

The considered analogue sensors (Figure 2) are the following:

  • TP2 [6] - Measures the pressure on the compressor.

  • TP3 [6] - Measures the pressure generated at the pneumatic panel.

  • H1 [6] - This valve is activated when the pressure read by the pressure switch of the command is above the operating pressure of 10.2 bar.

  • DV pressure [6] - Measures the pressure exerted due to pressure drop generated when air dryers towers discharge the water. When it is equal to zero, the compressor is working under load.

  • Motor Current [7] - Measure the motor’s current, which should present the following values: (i) close to 0A when the compressor turns off; (ii) close to 4A when the compressor is working offloaded; and (iii) close to 7A when the compressor is operating under load.

  • Oil Temperature [8] - Measure the temperature of the oil present on the compressor.

The digital sensors (Figure 3) only assume two different values: zero when inactive or one when a specific event activates them. The considered digital sensors were the following:

  • COMP - The electrical signal of the air intake valve on the compressor. It is active when there is no admission of air on the compressor, meaning that the compressor turns off or working offloaded.

  • DV electric - electrical signal that commands the compressor outlet valve. When it is active, it means that the compressor is working under load, and when it is not active, it means that the compressor is off or working offloaded.

  • TOWERS - Defines which tower is drying the air and which tower is draining the humidity removed from the air. When it is not active, it means that tower one is working, and when it is active, it means that tower two is working.

  • MPG - Is responsible for activating the intake valve to start the compressor under load when the pressure in the APU is below 8.2 bar. Consequently, it will activate the sensor COMP, which assumes the same behaviour as MPG sensor.

  • LPS - Is activated when the pressure is lower than 7 bars.

  • Oil Level - Detects the oil level on the compressor and is active (equal to one) when the oil is below the expected values.

Figure 3: Digital Sensors

Regarding the GPS Information, the train was equipped with a secondary GPS antenna to collect: latitude, longitude, speed and signal quality.

Figure 4: GPS Information
Figure 5: GPS Information. The train trajectory is interrupted when the train is inside a tunnel, due to loss of the GPS signal.

The data set available was collected from January to June 2022. It contains a single file with all variables and GPS coordinates, with almost data points collected from the air compressor.

Technical Validation

The ground truth was provided by the company using maintenance reports. According to the reported information, the dataset has three catastrophic failures. Two failures are related to air leaks in the system, and another is an oil leak.

  • Regarding the air leaks, the first one 6 is provoked by a malfunction on the pneumatic pilot valve that opens the drain pipes during the operation of the compressor.

    Figure 6: APU Control Failure - Air escaping by Drain Pipes
  • The second problem 7 was an air leak on a pipe that feeds several clients on the systems, such as breaks, suspension, etc. In the first case, the train recovered from the malfunction. In the second case, the train needed to move to the maintenance building.

    Figure 7: Air Leak on clients
  • Regarding the oil leak 8, due to hardware design, there was not any signal system related to oil to warn the train driver, the oil leak provoked severe damage to the engine of the compressor, and subsequentially, due to the inoperable compressor, it was observed a drop on the air pressure and the train needed to be removed from the tracks.

    Figure 8: Oil Leak on Compressor
Nr. Type Component Start End
1 Air Leak Air Dryer 28-2-22 21:53 1-03-22 02:00
2 Air Leak Clients 23-3-22 14:54 23-03-22 15:24
3 Oil Leak Compressor 30-5-22 12:00 02-06-22 06:18
Table 1: Failures reported on Maintenance Reports.

The dataset can be used for two primary purposes: i) Predicting failures and ii) Identifying the components involved in the failure. For the first task predicting failures, the goal is to predict when it starts and the duration of the failure. For validation purposes, a failure is a time interval: start-end. We use the the following evaluation protocol:

  • Goal: is to minimize the number of false positives and false negatives (# FP + # FN) (see Figure 9)

  • Requirement: from the company is to detect the failure at least two hours before the train becomes non-operational to remove it from the tracks safely.

Figure 9: Evaluation Protocol

Our goal is to discover the problems as early as possible after it manifests, i.e., to increase the overlap between the prediction and the ground truth. The second task is to identify the type of failure and in which component the failure occurs. Finally, it is crucial to compute the remaining useful life of the components to help the management team when they need to remove the train without provoking disruptions to the service.


This work was supported by the CHIST-ERA grant CHIST-ERA-19-XAI-012, and project CHIST-ERA/0004/2019 funded by FCT.


  • [1] M. Barros, B. Veloso, P. M. Pereira, R. P. Ribeiro, and J. Gama (2020) Failure detection of an air production unit in operational context. In IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning, pp. 61–74. Cited by: Methods.
  • [2] N. Davari, B. Veloso, G. d. A. Costa, P. M. Pereira, R. P. Ribeiro, and J. Gama (2021) A survey on data-driven predictive maintenance for the railway industry. Sensors 21 (17), pp. 5739. Cited by: Methods.
  • [3] N. Davari, B. Veloso, R. P. Ribeiro, P. M. Pereira, and J. Gama (2021) Predictive maintenance based on anomaly detection using deep learning for air production unit in the railway industry. In

    2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)

    pp. 1–10. Cited by: Methods.
  • [4] A. Esteban, A. Zafra, and S. Ventura (2022) Data Mining in Predictive Maintenance Systems: A Taxonomy and Systematic Review. WIREs Data Mining and Knowledge Discovery (May), pp. 1–45. External Links: Document, ISBN 2020115832 Cited by: Methods.
  • [5] J. Gama, R. P. Ribeiro, and B. Veloso (2022) Data-driven predictive maintenance. IEEE Intelligent Systems (01), pp. 1–2. Cited by: Methods.
  • [6] IFM Pressure transmitter pt5414. Note: Accessed on 7th July 2022 External Links: Link Cited by: 1st item, 2nd item, 3rd item, 4th item.
  • [7] LEM AC current transducer at-b420l. Note: Accessed on 7th July 2022 External Links: Link Cited by: 5th item.
  • [8] WIKA Thermocouple tc12-m. Note: Accessed on 7th July 2022 External Links: Link Cited by: 6th item.