Obesity prevalence has been continuously rising for the past forty years [euac2014] and is now one of the world’s biggest health challenges. Given that the disease is largely preventable, researchers have sought appropriate policy measures to limit the development of overweight and obesity, especially in children, since children who are overweight or obese are likely to remain obese in adulthood [lobstein2015].
Many large-scale public health actions are limited to indiscriminate blanket policies and single-element strategies [oude2009], that often fail to address the problem effectively. This has been attributed to the complex nature of the disease [finegood2010], implying that effective interventions must be evidence-based, adapted to the local context and address multiple obesogenic factors of the environment [oude2009, mackenbach2014], even on a local neighborhood level [Fagerberg2019].
Developing effective multi-level interventions addressing childhood obesity therefore requires data that link conditions in the local environment to children’s obesogenic behaviors such as low levels of physical activity, unhealthy eating habits, as well as insufficient sleep.
Most of the existing evidence on obesogenic behaviors of children rely on diet and physical activity recall questionnaires [richardson2001], which can lead to inaccurate measurements [dhurandhar2015] and often do not provide sufficient detail about the interaction of children with their environment (such as use of available opportunities for physical activity, or visits to different types of food outlets).
On the other hand, the availability and widespread use of wearable and portable devices, such as smartphones and smartwatches, provide an excellent opportunity for obtaining objective measurements of population behavior. This has not yet been exploited to its full extent for evidence-based policy decision support addressing childhood obesity.
In this paper we present an overview of the BigO system111http://bigoprogram.eu, which has been developed with the aim to objectively measure the obesogenic behaviors of children and adolescent populations in relation to the local environment using a smartphone and smartwatch application. Specifically, BigO provides policy makers the tools to (i) measure the behavior of a sample of the local population (targeting ages 9-18) regarding their physical activity, eating and sleep patterns, (ii) aggregate data at geographical level to avoid revealing any individual information about participants, (iii) quantify the conditions of the local urban environment, (iv) visualize and explore the collected data, (v) perform inferences about the strength of relationships between the environment and obesogenic behaviors and, finally, (vi) predict and monitor the impact of policy interventions addressing childhood obesity.
The following sections present an outline of the BigO system as well as the challenges that we have identified from its preliminary application to data collection pilots in schools and clinics, involving over 4,200 participants to date (April 2020).
Ii System Overview
A conceptual overview of the BigO system is shown in Figure 1. Data is collected from smartwatch and smartphone sensors through the “myBigO” app, available for Android and iOS operating systems [maramis2019]. It is then stored at the BigO DBMS, consisting of Apache Cassandra (for time series data) and MongoDB (for all other application data) databases. Data processing is carried out by the BigO analytics engine, built on top of Apache Spark.
The processing steps involve the extraction of individual and population-level behavioral indicators, the extraction of environment indicators, as well as statistical data analysis and machine learning model training. The processing outputs support the operation of the end-user interfaces which include the Public Health Authorities portal, School portal, Clinical Portal and the Community portal.
Ii-a Data acquisition
Administrative and operational data (e.g., number of exercise sessions per week at school, availability of school lunches, class start/end times) are collected at school and clinic level through the portals. Furthermore, data about body mass index range, age and sex of participants is also collected through the portal. No directly identifiable information (such as names or emails) is stored anywhere in the system.
Regarding individual participants, data is collected through a smartphone and smartwatch application. Collected data includes (i) triaxial accelerometer signal, (ii) GPS location data, (iii) meal pictures, (iv) food advertisement pictures, (v) meal self-reported data, (vi) answers to a one-time questionnaire (when the myBigO app is started for the first time) and (v) answers to recurring mood questionnaires.
The biggest challenges in data collection come from the battery power requirements of the accelerometer and location sensors. Accelerometer is sampled at a low sampling rate, which is device dependent and is usually in the range of 5-15Hz. Location data is sampled every minute. To preserve battery, the data acquisition module of the mobile application is compatible with the “doze” mode of the Android operating system. It stops data capturing whenever a device is inactive (doze mode) and restarts whenever the device becomes active again. During this time, acceleration is assumed to only be affected by gravity (not any kind of user motion) and location data is fixed to the last known position [diou2019JIAOS]
Ii-B Extraction of behavioral and environment indicators
Collected raw data are processed to extract behavioral indicators. This can take place at the mobile phone (to avoid transmitting raw data) or centrally, at the BigO servers. In both cases, raw data is considered sensitive and cannot be accessed directly. Aggregated behavioral indicators that result from the raw data are used instead [diou2019JIAOS]. There are three levels of granularity of behavioral indicators, namely (i) base indicators, which describe the behavior of an individual at fine temporal granularity (ii) individual indicators, which aggregate indicators across time to summarize the behavior of an individual and (iii) population indicators, which aggregate the behavior across individuals in a geographical region. Examples are provided in Tables I and II.
Base indicators are extracted through signal processing and machine learning algorithms [papapanagiotou2020a], such as [gu2017robust] and [genovese2017smartwatch] for step counting, [reiss2012creating] for activity type detection, [shafique2016travelo] for transportation mode detection and [luo2017improved] for detecting visited points of interest (POIs). Regarding the visited POIs, the information stored is the type of POI, from a pre-defined POI hierarchy, such as “restaurant”, “fast food or takeaway” and “sports facility”.
|Type of behavior||Base behavior indicator|
|Physical activity||Activity counts|
|Number of steps|
|Physical activity type|
|Location and transportation||Types of visited locations|
|Transportation mode used|
|Eating habits||Visits to food-related locations|
|Meals (self-reported, with pictures)|
|Sleep||Sleep duration (only for smartwatch users)|
Certain base indicators (such as activity counts) and individual indicators are only used at the clinical portal, which is accessible by health professionals to obtain information about their patients. In all other cases, aggregated information is used, for privacy protection purposes. There are two types of aggregation, leading to two different analysis types:
Habits: In this analysis we are interested in the overall behavior of participants living in a region
Use of resources: In this analysis we are interested in the behavior of participants visiting a region, but only during their visits to that region
A region can either be an administrative region or a geohash222A geohash is a public domain geocode system encoding rectangular geographical regions as alphanumeric strings.
In addition to the measured behavior of individuals, each geographical region is characterized by the local urban and socio-economic context. These are quantified by a set of variables called Local Extrinsic Conditions (LECs) in BigO, which are obtained through official statistics, or through GIS databases. Table II shows some examples of LECs used in BigO.
|Variable||How it is computed|
|Examples of behavior indicators|
|Percentage of visits to the region that include at least one visit to a fast food or a takeaway restaurant||Compute the percentage of visits to the region that include at least one visit to a fast food or takeaway restaurant. For the computation of the indicator, only visits that are 10 minutes or more are considered.|
Percentage of residents that have an average activity level that is sedentary
|For each resident of the region, first compute their number of steps for each minute of recorded data, for individuals with more than 20 hours of recorded data. This indicator is the percentage of residents of the region that walk, on average, less than 450 steps per hour.|
|Examples of Local Extrinsic Conditions|
|Average number of restaurants within 100m radius from locations within the region||Create a 30m point grid inside the region. For each point, compute the number of restaurants in a 100m radius. This value is the average across all points inside the region.|
|Number of athletics/sports facilities in the region||Using publicly available data sources compute the number of athletics/sports facilities inside the region.|
Ii-C Data analysis for causal inference and prediction
The collected behavior and environment indicators can be used to (i) infer associations and possible causal relations between LECs and obesogenic behaviors and (ii) predict and monitor the impact of interventions on the measured population behaviors.
Given the complexity and multifactorial nature of obesity [butland2007tackling], it is important to account for confounding factors and to avoid spurious correlations. To this end, we start from the Foresight’s Obesity System Map [butland2007tackling] and construct Directed Acyclic Graphs (DAGs) indicating causal relations between variables. An example DAG for physical activity is shown in Figure 2.
The associations between variables are quantified using explainable statistical models, such as linear and generalized linear models. Self-selection and other sources of bias will need to be quantified during the analysis as well. For prediction, non-linear machine learning models can be used as well, such as Support Vector Machines, or Neural Networks. The analysis of BigO data is still work in progress, however some preliminary results on prediction are presented in[sarafis2020].
Iii Data Collection Pilots
BigO has been deployed in clinics and in schools in Athens, Larissa and Thessaloniki in Greece, the Stockholm area in Sweden and Dublin in Ireland. Children join as citizen-scientists and contribute data about their behavior and environment through the myBigO app. The planned data collection time is two weeks per child. In schools, data is used in school projects, with the help of the BigO School portal, which provides data visualizations for participating school classes. In the clinic, the data is used by clinicians to monitor the behavior of patients, through the Clinical portal.
In total, children from 33 schools and 2 clinics have contributed data so far. Ethical approvals have been obtained, as well as the necessary consent from all participants. By April of 2020, BigO had reached out to 20,000 children and their parents, out of which over 4,200 registered in the system. Not all children provide the same amount of data. Reasons for children providing fewer data than expected include technical issues from the user side (e.g. smartwatch not properly paired with phone), technical issues with the smartphone (e.g., some smartphone manufacturers do not permit background apps to run) as well as low participant compliance. The current estimate is that monitoring data is received for approximately 68% of the app usage time, while approx 25% of the registered users do not provide accelerometer or GPS data (only self-reports and pictures). The currently collected data volume includes approximately 107 years of accelerometery data, 73 years of GPS data and 75,000 meal pictures. Note that the actual monitoring time is higher (since no data is recorded when the device is idle).
Iv Discussion and Conclusions
There are some noteworthy observations that result from the experience in organizing and deploying the BigO pilots.
On the technical side, there are significant obstacles to overcome when depending on off-the-shelf devices such as smartphones and smartwatches for data acquisition. Besides battery consumption, certain mobile phone manufacturers introduce custom modifications to the device operating system which can prevent background recording applications to run. Users must be aware of these restrictions and disable them for the myBigO app, a process which is device-dependent and not always straightforward.
Regarding recruitment, it seems that engaging teachers and clinicians is an effective way to invite the participation of children and their parents (who must give their consent). So far, the BigO portals have been used by 68 teachers and 17 clinicians, leading to an acceptance rate of approximately 21% for the children that were reached out to participate in BigO. This approach is now challenged by the recent school lockdowns due to the SARS-CoV-2 pandemic, but recruitment is expected to resume once schools open again.
It is clear that scaling up such data collection actions requires the active engagement of the local school and education authorities. In BigO, the pilots were carried through the initiative of participating researchers and schools/clinics that decided to join, without the direct support from the local authorities. Our vision is that by demonstrating that such citizen-science activities are effective for collecting data to formulate evidence-based policies, BigO will motivate local authorities to adopt such systems as part of their decision-making process. While data collection in the BigO pilots continues, the next steps include the analysis of the collected data and the dissemination of results to all relevant regional and national government bodies.