Bluetooth Low Energy RSSI measurements with ground-truth distances.
Digital contact tracing approaches based on Bluetooth low energy (BLE) have the potential to efficiently contain and delay outbreaks of infectious diseases such as the ongoing SARS-CoV-2 pandemic. In this work we propose a novel machine learning based approach to reliably detect subjects that have spent enough time in close proximity to be at risk of being infected. Our study is an important proof of concept that will aid the battery of epidemiological policies aiming to slow down the rapid spread of COVID-19.READ FULL TEXT VIEW PDF
Bluetooth Low Energy RSSI measurements with ground-truth distances.
In our experiments we use three different epidemiological models to convert the proximity values into infectiousness scores
where is the contact distance measured in cm. All three models are monotonically decreasing functions of the distance and the infectiousness score decreases with increasing distance.
The main use of epidemiological models in our experiments is to generate ground truth labels for our data, which consists of a time series of RSS values and corresponding distances (the latter is not available in real settings). To generate the labels, we integrate the infectiousness scores over the contact time according to the equation
For every epidemiological model there exists a reference, from which on no infection is expected. For instance, for COVID-19 it is assumed that a physical proximity between two people of less than 2 meters over a time period of 900 seconds (15 minutes) results in a high risk of being infected [European Centre for Disease Prevention and Control, 2020]. Inserting the reference sequence , with
into equation (5) results in a local threshold
By selecting the epidemiological model and the infectiousness threshold we can determine, which time series of distance measurements should be considered dangerous and which should not:
An alternative approach is to label the data with a global threshold. For that we need to have an estimate of the expected number of newly infected contact persons from previously infected persons. This number can be computed with the basic reproduction number as
One can then chose in a way so that the number of high risk encounters matches the expected number of new infections, i.e.,
where is the total number of recorded proximity histories.
Given an epidemiological model and the true distances we can label encounters into “high risk” and “low risk”. Since the true distances are not available in real settings, we aim to train a machine learning model to predict these labels from the raw555For practical reasons we resampled the RSS values to 1Hz. RSS measurements of the BLE signal. To simplify the learning task, we extract features from the RSS data and provide them as input to the ML algorithm. In particular, we tested the following three feature sets:
sum: total sum of received RSS values resulting in one-dimensional features
dur_max_mean: duration, maximum and mean of received RSS values resulting in three-dimensional features.
freq: amplitudes of first frequencies of received RSS values resulting in -dimensional features.
We input these features into a linear regression model in order to obtain a predicted “risk” score:
The input to the linear regression thus comprises a vector of parameters, a bias term and a vector of extracted features . The resulting predicted risk score is then compared to a threshold, which can be set to . If the predicted risk exceeds the threshold the encounter which resulted in the sequence of RSS measurements is considered “high risk“.
A measurement campaign was performed to test and validate the proposed infection risk estimation model. This section describes the setup of the experiment.
The measurements on the 1st of April and the 7th of April were performed using 48 Samsung A40 smartphones of the same type that were carried by 48 protected soldiers, respectively. Tests were carried out at five different locations within the Julius Leber barracks in Berlin. There were three rooms within a conference center and two outdoor locations, with ten subjects each. All test subjects were equipped with face masks so that there was no risk of infection.
The floor of the test areas was marked (Fig. 2). These markings consisted of a 5 m x 5 m grid with lines spaced 50 cm apart. From the starting point (box within a box) to the ending point (multiplication sign), the test subjects had to walk through markings and stay on each marker for a predetermined amount of time (2, 4, 6, or 10 min). The markings are numbered on the green path from 1 to 9 and on the black path from 2 to 10 (Fig. 2, right). Two cameras were installed at each location to video record the test so that the exact locations of the test subjects could be checked after the test.
The test was carried out in four runs. During the runs, the test subjects were instructed not to move too much, to hold the positions of the mobile phones relatively stable, and to stand within the square.
RSS data was collected via a prototype of the PEPP-PT App. The RSS data - recorded at a random and potentially varying frequency between 0.1 Hz and 10 Hz - was re-sampled to 1Hz. Ground truth distance data was derived from the predefined movement pattern on the grid. The labeling was additionally verified with the help of video footage that was taken at the test area. For every pair of soldiers we collected multiple data points, where one data point comprised of two aligned sequences:
A time series of distances (from which the ground truth risk can be derived).
A time series of BLE RSS values , recorded by mobile phones held by the soldiers.
For training and testing, the time series data was separated into two folds according to the room in the test area in which the data was collected. Data collected in rooms 1 and 2 (indoor) and room 4 (outdoor) was combined in the training set. Data collected in rooms 3 (indoor) and 5 (outdoor) was combined in the validation set. In previous tests multiple combinations of indoor and outdoor rooms were tested to investigate possible covariate shift between indoor and outdoor scenarios. No significant effects could be detected, therefore the aforementioned mixed split was used.
We trained a machine learning model to predict the ground truth risk, by only using features extracted from the RSS time series data
. Since the labels are not balanced (i.e. there are more negative than positive events), we use area under the ROC (receiver operating characteristics) curve (AUC) metric to evaluate the performance of our model. The AUC metric is a measure for how well the data can be separated using our classifier. An AUC value ofindicates no predictive power and indicates perfect predictive power.
The obtained results are presented in Fig. 3. The columns correspond to different epidemiological models, namely (linear, box, sigmoid), whereas the rows represent different combinations of features which we feed into the linear regression. Given the critical risk threshold derived by applying the respective risk model to the reference sequence (1), we display the achieved AUC for every combination of risk model and feature combination.
An encounter between two individuals is labeled as “high risk” if the value of exceeds a predefined critical risk threshold . This threshold can either be set locally, i.e., for each encounter, or globally based on the basic reproduction rate .
In order to evaluate the reliability of our results, we tested the model on data recorded with the same experimental setup, but on a different dates (7th April 2020 and 14th April 2020). In the experiments conducted during the 14th of April, participants were using different smart phone models and the phone holding positions were varied (”hand”, ”ear”, ”pocket”). Figure 4 compares the AUC values of the two measurement campaigns for the three epidemiological models (linear, box, sigmoid) and three sets of features (sum, dur_max_mean, freq). As can be seen, the performance of the proposed infection risk estimation method is comparable for the experiments conducted on the 1st of April and the 7th of April. For the experiments conducted on the 14th of April however the feature set dur_max_mean distinctively outperforms all other tested feature combinations. Evidently this combination of features is able to approximate the ground truth risk in a more robust way than the other investigated feature combinations.