Bayesian Nonparametric Monotone Regression

by   Ander Wilson, et al.
Colorado State University

In many applications there is interest in estimating the relation between a predictor and an outcome when the relation is known to be monotone or otherwise constrained due to the physical processes involved. We consider one such application–inferring time-resolved aerosol concentration from a low-cost differential pressure sensor. The objective is to estimate a monotone function and make inference on the scaled first derivative of the function. We proposed Bayesian nonparametric monotone regression which uses a Bernstein polynomial basis to construct the regression function and puts a Dirichlet process prior on the regression coefficients. The base measure of the Dirichlet process is a finite mixture of a mass point at zero and a truncated normal. This construction imposes monotonicity while clustering the basis functions. Clustering the basis functions reduces the parameter space and allows the estimated regression function to be linear. With the proposed approach we can make closed-formed inference on the derivative of the estimated function including full quantification of uncertainty. In a simulation study the proposed method performs similar to other monotone regression approaches when the true function is wavy but performs better when the true function is linear. We apply the method to estimate time-resolved aerosol concentration with a newly-developed portable aerosol monitor. The R package bnmr is made available to implement the method.



There are no comments yet.


page 1

page 2

page 3

page 4


Nonparametric inference under a monotone hazard ratio order

The ratio of the hazard functions of two populations or two strata of a ...

Convergence Rates for Bayesian Estimation and Testing in Monotone Regression

Shape restrictions such as monotonicity on functions often arise natural...

Adaptive Inference in Multivariate Nonparametric Regression Models Under Monotonicity

We consider the problem of adaptive inference on a regression function a...

Dirichlet Process Mixtures of Generalized Linear Models

We propose Dirichlet Process mixtures of Generalized Linear Models (DP-G...

A unified study of nonparametric inference for monotone functions

The problem of nonparametric inference on a monotone function has been e...

Causal Inference with the Instrumental Variable Approach and Bayesian Nonparametric Machine Learning

We provide a new flexible framework for inference with the instrumental ...

Generalized nearly isotonic regression

The problem of estimating a piecewise monotone sequence of normal means ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In environmental, biomedical, and engineering applications a common objective is to estimate the relation between a predictor and an outcome when there is prior knowledge that the relation is monotone or otherwise shape-constrained. In this paper we consider one such application that relates to measuring airborne particles at fine-temporal resolution using a recently-developed portable monitor. At the center of this problem is estimation of a function that is known to be monotone due to the physical processes involved in the monitor and making inference on the scaled first derivative of the estimated monotone function which is equal to estimated aerosol concentration.

Measuring air pollution with high temporal and spatial resolution is critical to both conducting air pollution research and protecting the public’s health. In an ideal world, we would be able to use a large number of monitors to measure personal air pollution exposure in cohort studies of health effects or to deploy in networks to warn of potential risks such as those from exposure to wildfire smoke. However, the large size and high cost of air quality monitors has historically prohibited widespread use. Hence, there is a need to develop smaller, more affordable monitors and the accompanying data science tools to make meaningful inference on the readouts of these monitors.

In this paper we consider inference for data generated by the recently-developed Mobile Aerosol Reference Sampler (MARS). MARS was designed to be an affordable, portable monitor for measuring fine particulate matter () concentrations in environmental and occupational health studies (Tryner et al., 2019). The MARS device is built on the Ultrasonic Personal Aerosol Sampler (UPAS) platform which has also been previously described in the literature (Volckens et al., 2017). MARS uses a piezoelectric microblower to pull air through a PM cyclone inlet and a 25mm filter. A high-resolution pressure sensor measures the time-resolved pressure drop across the sampling filter. As particles accumulate on the filter the pressure drop across the filter increases. This pressure drop should be positive and increase monotonically in time during measurement. Deviations from monotonicity only occur (1) in the first few minutes of use when a new filter is stretching out or (2) if there is a change in air density or particle source. In the experimental data used in this paper, particle source remained constant and only minor changes in air density occurred. Time-resolved PM concentration can be inferred from the time-resolved rate of change in pressure drop after the latter is normalized to the total PM mass collected on the filter. Specifically, when the derivative is scaled so that the area under the derivative function is equal to total PM mass collected on the filter divided by volumetric flow rate, then the scaled first derivative is a measure of PM concentration as a function of time (Novick et al., 1992; Dobroski et al., 1997). Hence, the objective is to estimate the pressure drop as a function of time and then make inference on the scaled first derivative of pressure drop.

Several approaches have been proposed to estimate monotone functions. Early works include estimation of shape-constrained piecewise linear functions (Hildreth, 1954; Brunk, 1955). Mammen (1991) proposed monotone kernel smoother methods and Mammen et al. (2001) proposed monotone projections of unconstrained smooth estimators. A large number of spline based approaches have been proposed including cubic smoothing splines (Wang and Li, 2008), constrained regression splines (Ramsay, 1988; Meyer, 2008; Meyer et al., 2011; Powell et al., 2012), penalized splines (Meyer, 2012), and piecewise linear splines (Neelon and Dunson, 2004). Several recent papers have proposed monotone Bernstein polynomial (BP) regression (Chang et al., 2005, 2007; Curtis and Ghosh, 2011; Wang and Ghosh, 2011; Wilson et al., 2014; Ding and Zhang, 2016).

In this paper we take a BP approach to constrained regression. Monotonicity can be imposed with BPs by imposing a linear order constraint on the regression coefficients. An alternative but equivalent approach is to linearly transform the regression coefficients and then impose a positivity constraint on all of the transformed regression coefficients with the exception of the intercept, which is unconstrained

(Wang and Ghosh, 2011). Curtis and Ghosh (2011)

proposed a variable selection approach to monotone regression with BPs that puts a variable selection prior on the transformed regression coefficients akin to a mixture of a mass point at 0 and a normal distribution truncated below at 0. The approach is appealing because it imposes monotonicity, allows for data-driven tuning of the model by selecting excess basis functions out of the model and allows for no association when all coefficients are selected out of the model.

The approach we present here, which we refer to as Bayesian nonparametric monotone regression (BNMR), is similar to that of Curtis and Ghosh (2011)

in that we use a BP expansion and a variable selection prior that imposes monotonicity. In contrast, our approach both selects some regression coefficients to be zero and clusters other regression coefficients. By clustering regression coefficients we create a reduced set of combination basis functions that are each the sum of multiple BPs and assigned a single regression coefficient. This has two distinct advantages over only variable selection. First, when all regression coefficients are clustered together into a single combination basis function the approach is equivalent to performing linear regression with the slope constrained to be non-negative. This improves performance when the true regression function is in fact linear. Second, when the true regression function is nonlinear our approach requires a reduced number non-zero regression coefficients each corresponding to the combination of a mutually exclusive set of basis functions. In a simulation study we show that our approach is able to match the flexibility of alternative approaches but uses a smaller number of parameters. As a result our Markov chain Monte Carlo (MCMC) approach samples from the full conditional of a truncated multivariate normal distribution of smaller dimension which can reduce autocorrelation in the resulting chain. Hence, the proposed method allows for flexible monotone regression while allowing the model to be null when there is no association between predictor and outcome and allowing the function to be linear when there is no evidence of nonlinearity. This results in comparable performance to other approaches for smooth nonlinear functions but improved inference when the true relation is linear.

We apply the proposed approach to evaluate 12 samples collected using MARS in a controlled laboratory chamber. We compare estimated time-resolved PM inferred with the proposed method based on 30-second measurements of pressure drop across the MARS filter to minute-resolution measurements of PM in the chamber reported by a tapered element oscillating microbalance (TEOM) (1405 TEOM, ThermoFisher Scientific, Waltham, MA, USA), which is a regulatory-grade PM monitor.

2 Methods

2.1 Model formulation

Our primary interest is estimating the regression function


where is an unknown monotone function. Without loss of generality, and consistent with our application, we assume that is monotone increasing. We also assume that is scaled to the unit interval.

We parameterized using a BP expansion. The BP basis function of order is


The regression function expressed as a weighted combination of BPs is


where are regression coefficients and . The first regression coefficient parameterizes the intercept. Figure (a)a shows the BP basis used in the data analysis.

(a) Bernstein polynomial basis
(b) Transformed Bernstein polynomial basis
(c) Selected transformed Bernstein polynomial basis used with BISOREG
(d) Linear combination of transformed Bernstein polynomial basis used with BNMR
Figure 5: Various representations of the Bernstein polynomial (BP) basis functions. Panel (a)a shows the 51 BP basis functions of order (). Panel (b)b shows the transformed BP basis represented as as described in  2.1. This transformation is used for both BNMR and BISOREG. Panel (c)c shows the posterior mode group of basis functions selected to be included into the model with BISOREG. This is a subset of the transformed basis functions shown in Panel (a)a. Panel (d)d shows the posterior model combination of basis functions included with BNMR. This includes the intercept and three basis functions which are each a linear combination of one to three of the basis functions shown in panel (b)b and subsequently linear combinations of the basis functions shown in Panel (a)a. Results from all 12 runs are shown in the supplemental material.

The regression function in (3) is monotone increasing if for all . Following Curtis and Ghosh (2011), it is convenient to reparameterize the regression coefficients. Let where and the -matrix is such that and for :


The regression function is then


Figure (b)b shows the transformed basis used in the data analysis.

Using this reparameterization is monotone increasing when for all . Further, is linear with the form then including no association when .

We assign a prior to , , that is a finite mixture of a mass point at zero denoted by the Dirac measure and a distribution

with positive support. This approach selects some regression coefficients to be 0, effectively removing those basis functions from the model. In the non-zero probability event that all regression coefficients are zero there is no association between

and . We then let the positive distribution be a Dirichlet process (DP) with base measure , where implies a truncated normal with support , mean

, and variance

. By using a base measure with support over we ensure that the non-zero regression coefficients are positive. This imposes monotonicity of . Further, the clustering property of the DP allows for all regression coefficients to be equal, in the same cluster, allowing for positive probability that is linear. The selection and clustering of the regression coefficients does not, however, impact smoothness. The estimated function is guaranteed to be smooth and differentiable.

The full model is


The above model is equivalent to a DP with base measure that is a finite mixture


Several papers have used similar DP constructions that combine a DP with a finite mixture of a mass point and a non-truncated normal distribution (Herring, 2010; Canale et al., 2018; Cassese et al., 2019)

or a gamma distribution

(Liu et al., 2015).

We complete the specification by assigning the prior , a normal mean zero variance prior to the intercept , and .

2.2 Posterior computation

The model in (6) can be efficiently sampled with a Gibbs sampler. This is accomplished by first integrating out from the model. The Gaussian likelihood and truncated normal base measure allows for to be marginalized out of the model as well. The posterior can be simulated using a Polya Urn scheme (Blackwell and MacQueen, 1973; West et al., 1994; Bush and MacEachern, 1996).

Let be the transformed BP basis expansion for observation and be the design matrix with row equal to . Let

denote the vector

with the element omitted, the vector with the element omitted, and denote only the element of . Similarly, let be the matrix with the column omitted and be only the column of . Finally, we denote by the categorical indicator where if and the number of coefficients in cluster where is the number in the null cluster with .

The full conditional for , , is categorical. The conditional probability that the regression coefficient is equal to 0 is


where is the number of regression coefficients in cluster , the null cluster, excluding , is a normalizing constant, and denotes a normal density function. In contrast to standard DP models, the zero cluster is allowed to be empty in this model. The conditional probability that is allocated to an existing non-zero cluster is

Finally, the conditional probability that is allocated to a new cluster is

where and .

In the situation where is assigned to new cluster a value for can be sampled from its univariate truncated normal full conditional. The full conditional for a single regression coefficient where (no other coefficient takes that value) is truncated above 0 and has mean and variance as specified above. We use the hybrid univariate truncated normal sampler of Li and Ghosh (2015) to sample from this full conditional.

The -vector contains three types of elements: the unconstrained intercept, parameters that are selected to be 0, and parameters that are non-zero and are constrained to be greater than 0. The non-zero values take on unique values where is the unconstrained intercept. Using this notation the linear predictor where is a transformation matrix that maps to according to . The vector has a truncated multivariate normal full conditional with mean and variance where is is a diagonal matrix with in the first diagonal location for the intercept and in all other diagonal locations for the constrained coefficients. These are the same as the typical mean and variance for a normal-normal model full conditional. The first element is not truncated and the remaining elements are truncated below at 0. We simulate from the full conditional of as a multivariate block using the hybrid multivariate sampler approach of Li and Ghosh (2015).

The Gibbs sampler is completed with standard updates of using a mixture of gammas (Escobar and West, 1995) and using the standard gamma full conditional.

2.3 Details on tuning

Care must be given when specifying the prior, particularly for the choice of values for the mean and standard deviation of the base measure

and . This is challenging because the plausible values for the regression coefficients depends on the number of non-zero regression coefficients in the model and how many basis functions each coefficient is applied to (cluster size). We do not know either of these quantities a priori. We have taken the approach of scaling the outcome to have mean zero and variance one and then setting and . This puts reasonable mass on values between zero and one which represents plausible values for a variety of basis configurations. We have found that this choice performs well across a variety of simulated and real datasets. We use this setting in all simulation and data analysis results presented in this paper. However, results can be sensitive to this choice. Supplemental Section 1.2 includes an additional simulation study that compares sensitivity to different values of and . We show that as

increases the posterior probability of no association decreases and the number of clusters (unique non-zero regression coefficients) increases. However, the model fit as measured by RMSE on

and the derivative of is less sensitive to this choice.

The user must also specify the order of the BP (). This should be selected, in theory, based on sample size and the differentiability of the function being estimated (Mclain and Ghosh, 2009). In practice, methods such as reversible jump MCMC or Kullback-Leibler distance have been used to attempt to estimate the dimension of basis expansions to be used in nonparametric regression (e.g. Dias and Gamerman, 2002; Dias and Garcia, 2007; Meyer et al., 2011) while penalization can be used to regularize a rich basis to avoid over fitting (Crainiceanu et al., 2005). It has additionally been noted that shape constraints, including monotonicity, reduce sensitivity to the dimension of the basis expansion (Meyer, 2008). We follow the approach of Curtis and Ghosh (2011) and use a rich basis and let the prior select or cluster redundant predictors. In this paper, we use in all results shown in the main text but show in the supplement that a smaller value of results in lower RMSE when the true function is closer to linear and a higher value of is preferred when the true function is more wiggily. If the practitioner has prior knowledge of the shape of the underlying function, beyond monotonicity, this could be incorporated into the selection of a priori.

2.4 Inference on the derivative and aerosol concentration

The proposed approach allows for coherent estimation and inference on not only the function but on the derivatives of . This includes full quantification of the uncertainty in the derivatives and guaranteed smoothness in the derivatives. This is particularly critical in our application where the first derivative of is proportional to the time-resolved aerosol concentration. For a BP of order the first derivative is a BP of order . Specifically the first derivative is


For the derivative the regression coefficient , which corresponds to the intercept, is not included. Hence, the derivative can be identified in closed form from the posterior sample of . Inference on the derivative can be made directly by using the posterior sample of .

From a theoretical perspective the total aerosol mass accumulated on the filter should be the flow rate through the filter times the concentration integrated over time. Here, flow rate is constant and therefore . In our model . We therefore scale the derivative to the total filter mass by replacing in (11) with . We then estimate aerosol concentration as


In practice, we scale each draw of from the posterior and then construct a posterior sample of .

2.5 Alternative spline approach

The proposed prior can be applied to other basis expansions and achieve some, but not all, of the same properties. Using the same prior structure with a transformed -spline or -splines without the transformation matrix can still achieve monotonicity (de Boor, 1978; Ramsay, 1988). The proposed prior will also allow estimation of no association when all regression coefficients are clustered at zero. However, the clustering will not result in shrinkage toward a linear response.

Using a spline basis with compact support may result in more flexibility than the BP approach presented here. This could be particularly appealing when the function being estimated has sharp change points. In addition, the derivative of many common splines including -splines and -splines can be represented as a spline itself and inference on the derivative can be made using a similar approach. However, splines lose flexibility and smoothness in the derivative. For example, the standard cubic spline has a quadratic derivative while a quadratic spline has a piecewise linear derivative. This may not be sufficiently flexible in many cases, as seen in the data analysis in Section 4. In contrast, the BP uses a higher order polynomial and therefore has a higher order derivative which imposes smoothness not only in the function being estimated but in all derivatives of that function.

3 Simulation

We compare the proposed approach, BNMR, to alternative methods for monotone regression in a simulation study. We generated 500 datasets from four designs each taking the form for with . We generate and consider four shapes for the function :

  1. Flat: .

  2. Linear: .

  3. Wavy: .

  4. Flat-nonlinear: for and for .

The flat, linear, and wavy functions mirror those from Curtis and Ghosh (2011). We simulated data sets of size and .

We compared BNMR to alternative monotone regression methods that have available R software that includes variance estimates. The comparison methods are: constrained generalized additive models (CGAM, Meyer, 2013; Meyer and Liao, 2018), Bayesian constrained generalized additive models (BCGAM, Meyer et al., 2011; Oliva-Aviles and Meyer, 2018), and Bayesian isotonic regression (BISOREG, Curtis and Ghosh, 2011; Curtis, 2018)

. In addition we compare with the unconstrained methods ordinary least squares (OLS), local polynomial regression (LOESS), and an unconstrained Bernstein polynomial model (UBP). For BNMR and BISOREG we set

and consider other values in the Supplemental Material. For UBP we select using deviance information criterion (DIC, Spiegelhalter et al., 2002).

We evaluate the model performance by the root mean squared error (RMSE) on the function and the pointwise 95% interval coverage both evaluated at 100 evenly spaced points spanning the range of . For the Bayesian methods (BNMR, BISOREG, BCGAM, and UBP) we consider the posterior probability that is linear and that is flat (no association). For the CGAM and OLS we report the mean

-value for testing the null hypothesis that there is no association.

Table 1 shows results from the simulation study. At , BNMR had the lowest RMSE on

among all the monotone regression methods on all four scenarios (within one standard error with BCGAM and BISOREG on the flat scenario). The only method to have lower RMSE was OLS on the linear scenario. BNMR, BISOREG, CGAM, and UBP all had pointwise 95% interval coverage between 0.95 and 0.98 on all scenarios. Each of the other methods had interval coverage of 0.9 or less on at least one scenario. At

, BNMR had the lowest RMSE for the flat and flat-nonlinear scenarios (within one standard error with BCGAM and BISOREG on the flat scenario). CGAM and BCGAM had the lowest RMSE of the constrained methods on the wavy scenario with BNMR and BISOREG slightly higher. OLS and UBP had the lowest RMSE on the linear scenario. UBP selected , a cubic regression function, in almost all datasets for the linear scenario.

Model RMSE Coverage Pr( flat ) RMSE Coverage Pr( flat )
Scenario 1: Flat
BCGAM 2.22 0.95 0.00 0.65 0.96 0.00
BISOREG 2.19 0.97 0.86 0.61 0.97 0.95
BNMR 2.08 0.96 0.94 0.60 0.96 0.99
CGAM 3.57 0.98 0.49 1.17 0.99 0.50
LOESS 5.15 0.93 NA 1.61 0.95 NA
OLS 3.11 0.95 0.50 0.93 0.96 0.53
UBP 5.23 0.93 0.00 1.60 0.95 0.00
Scenario 2: Linear
BCGAM 6.42 0.84 0.00 2.58 0.84 0.00
BISOREG 5.99 0.96 0.00 2.42 0.95 0.00
BNMR 4.72 0.98 0.00 2.14 0.96 0.00
CGAM 5.79 0.96 0.00 2.30 0.95 0.00
LOESS 5.15 0.93 NA 1.61 0.95 NA
OLS 3.11 0.95 0.00 0.93 0.96 0.00
UBP 5.28 0.93 0.00 1.60 0.95 0.00
Scenario 3: Wavy
BCGAM 6.57 0.84 0.00 2.17 0.88 0.00
BISOREG 5.98 0.95 0.00 2.25 0.95 0.00
BNMR 5.30 0.96 0.00 2.25 0.94 0.00
CGAM 5.67 0.96 0.00 2.15 0.96 0.00
LOESS 6.44 0.89 NA 2.14 0.93 NA
OLS 8.02 0.56 0.00 7.22 0.19 0.00
UBP 6.38 0.90 0.00 2.30 0.90 0.00
Scenario 4: Flat-nonlinear
BCGAM 5.33 0.90 0.00 1.93 0.89 0.00
BISOREG 5.60 0.95 0.00 2.12 0.96 0.00
BNMR 4.95 0.96 0.00 1.82 0.96 0.00
CGAM 5.29 0.96 0.00 1.91 0.97 0.00
LOESS 5.70 0.91 NA 1.88 0.93 NA
OLS 16.11 0.32 0.00 16.24 0.09 0.00
UBP 5.42 0.93 0.00 1.91 0.91 0.00
Table 1: Simulation results comparing estimation of with each method. The table shows RMSE and 95% interval coverage both evaluated pointwise on a grid of 100 evenly spaced points. The columns labeled Pr( flat ) are the posterior probability of a flat response or for OLS and CGAM the mean -value for rejecting the null of association. Additional simulation results including standard errors for the RMSE and interval widths are included in the supplemental material.

Both BNMR and BISREG had high, greater than 0.86, posterior probabilities of a flat response (no association) in the flat scenario. BCGAM does not include a flat response in the parameter space and therefore has a posterior, and prior, probability of 0. The average

-values for the test of no association for CGAM and OLS were between 0.49 and 0.53.

BNMR is the only method that allows the estimated function to be linear with slope greater than 0. However, in the linear scenario this did not occur. The mean posterior probability of a linear function was 0.00. The response is linear when all regression coefficients are non-zero and take the same value. Figure 6 shows the number of non-zero regression coefficient in the model and the number of unique values those regression coefficients take. Both BNMR and BISOREG include only a small number of non-zero regression coefficients, effectively selecting out of the model the majority of the basis functions. Because not all basis functions are included the estimated regression function is never truly linear. Despite not being exactly linear, BNMR has lower RMSE than any of the other nonparametric methods on the linear scenario.

A key difference between BNMR and BISOREG is that all non-zero regression coefficients in BISOREG take unique values while with BNMR the non-zero regression coefficients are clustered together and take fewer unique values. On average, there were more non-zero regression coefficients included into the model with BNMR but fewer, less than two, unique regression coefficients. This is true for both the linear and nonlinear scenarios and for BP expansions of order ranging from 20 to 100 (shown in supplemental material). As a result, the proposed approach requires estimating only a small number of unique regression coefficients regardless of the size of the basis expansion or the wiggliness of the regression function.

Figure 6: Simulation results for the number of non-zero regression coefficients (dashed line) and the number of unique values of the non-zero regression coefficient (solid line) for BISOREG (triangle) and BNMR (). The number of unique non-zero values is always equal to the total number of non-zero values in BISOREG.

In our application we are interested in the derivative of the monotone function. The BP basis used by BISOREG, BNMR and UBP allows straight forward inference on the derivatives of . The other methods do not naturally allow for this inference. We calculate a pointwise approximation of the derivative for the other methods by calculating change in divided by change in for each pair of neighboring observations on an equally spaced grid. We do not evaluate coverage for these methods. Table 2 shows the RMSE and 95% interval coverage (for BISOREG, BNMR and UBP only) for the derivative of . BNMR had lowest RMSE for all scenarios at the smaller sample size and the flat scenario at the larger sample size. BNMR, BISOREG, and UBP all suffered from poor interval coverage in several of the scenarios. The coverage is pointwise and in the flat-nonlinear scenario, which has an sharp “elbow” change-point, both methods fail to cover in the elbow, highlighting a limitation of the smooth BP basis.

Model RMSE Coverage RMSE Coverage
Scenario 1: Flat
BCGAM 3.24 NA 0.85 NA
BISOREG 4.90 0.00 0.61 0.00
BNMR 1.12 1.00 0.19 1.00
CGAM 22.80 NA 9.92 NA
LOESS 53.81 NA 18.49 NA
UBP 55.26 0.92 16.61 0.93
Scenario 2: Linear
BCGAM 61.68 NA 39.98 NA
BISOREG 65.54 0.96 44.91 0.94
BNMR 39.57 1.00 38.21 0.95
CGAM 68.85 NA 40.79 NA
LOESS 53.81 NA 18.49 NA
UBP 56.90 0.91 16.60 0.93
Scenario 3: Wavy
BCGAM 64.03 NA 32.05 NA
BISOREG 70.78 0.95 46.53 0.91
BNMR 55.00 0.97 45.13 0.85
CGAM 69.21 NA 37.04 NA
LOESS 87.54 NA 35.18 NA
UBP 97.38 0.78 52.52 0.68
Scenario 4: Flat-nonlinear
BCGAM 62.08 NA 28.82 NA
BISOREG 92.18 0.46 65.44 0.45
BNMR 61.45 0.68 46.91 0.61
CGAM 66.95 NA 34.57 NA
LOESS 65.11 NA 28.99 NA
UBP 62.59 0.88 28.39 0.66
Table 2: Simulation results comparing estimation of the derivative with each method. The table shows RMSE and 95% interval coverage both evaluated pointwise on a grid of 100 evenly spaced points. Intervals for the derivative with BCGAM, CGAM and LOESS are not available. Additional simulation results including standard errors for the RMSE and interval widths are included in the supplemental material.

The supplemental material includes additional simulation results, including standard errors for the estimates in Tables 1 and 2, interval widths, and results on sensitivity to the choice of prior and as well as the order of the BP . In addition we show results for computation time as a function of sample size and order of the BP.

4 Analysis of Real-Time PM Concentration Inferred from Pressure Drop

4.1 Overview of the data analysis

We use data from 12 samples collected using three MARS devices during four laboratory experiments. These experiments are described in detail by Tryner et al. (2019). During each experiment, one of four different types of aerosol—urban PM (NIST SRM 1648A Urban PM), ammonium sulfate ((NH)SO), Arizona road dust, or match smoke—is nebulized into a controlled chamber containing all three MARS. Each MARS samples PM onto a new polytetrafluoroethylene (PTFE) filter at a flow rate of 1 L min for between 7.5 and 13 hours while pressure drop across the filter is recorded every 30 seconds. Each filter is weighed before and after the experiment to measure the total mass of PM accumulated. A TEOM measures the PM concentration in the chamber every minute as a previously-validated point of comparison.

We use BNMR to estimate time-resolved PM concentration using the MARS pressure drop data from the 12 samples. Prior to analysis we removed the first 30 minutes of pressure drop as: 1) there was no PM in the chamber at that time and 2) the new filter was stretching during that time and a decreasing trend is observed due to the stretching process. We also removed the final five minutes when there was 1) no PM in the chamber and 2) the sampler was shutting down resulting in spurious noise in the pressure drop function. Then, we fit BNMR to the time-series of measured pressure drop for each sample. From the fitted model we then estimate the scaled first derivative of pressure drop at each time-point for which the TEOM recorded PM as described in Section 2.4. For comparison, we perform the same procedure with BISOREG and UBP. We also estimate pressure drop and the scaled pointwise approximation of the derivative with LOESS, CGAM, and BCGAM. We omit OLS because of obvious nonlinearities in the pressure drop data.

For each method we visualize and compare the performance with respect to estimating the pressure drop function and inferring real-time PM from the scaled derivative of pressure drop. We show results from one of the 12 samples in the main text. The supplemental material includes estimated pressure drop and estimated real-time PM concentration for all 12 samples.

4.2 Estimation of the pressure drop function

Figure 13

shows the data and estimates from all six methods for a single sample. Each panel show estimates from a single method along with 0.95 confidence or credible intervals. The fits are near identical visually over most of the range. However, there are differences in the lower tail. BNMR and BISOREG tend to level-off between 0 and 100 minutes. In contrast CGAM and especially BCGAM tend to over-smooth over the same time period.

(a) UBP
(c) BNMR
(e) CGAM
Figure 13: Estimated pressure drop from the MARS data for one run. Each panel shows the estimates and 95% intervals for each method separately. Results from all 12 runs are shown in the supplemental material.

Comparing UBP to BNMR and BISOREG, which all use a BP basis, highlights an important difference between the constrained methods and the unconstrained method. Specifically, UBP experiences instability in the tails, while BNMR and BISOREG which impose monotonicity and further regulate with a selection prior (BISOREG) or a selection and clustering prior (BNMR), are more stable in the tails.

To formally compare model fit for the pressure drop function we performed five-fold cross-validation for each sample. Table 3 shows cross-validation results for all five methods across all 12 samples. LOESS had the lowest cross-validation RMSE at 0.81 followed by UBP at 1.21. Hence, the unconstrained methods provided a better fit then any of the monotone methods. The best performing monotone methods were BISOREG at 1.27 and BNMR 1.29. CGAM and BCGAM had higher RMSEs of 1.47 and 1.96, respectively.

Cross-Validation Regression
Model RMSE Intercept Slope
BCGAM 1.96 0.57 37.27 0.97
BISOREG 1.27 0.75 46.89 0.91
BNMR 1.29 0.75 44.59 0.92
CGAM 1.47 0.72 -2.03 1.09
LOESS 0.81 0.81 51.60 0.88
UBP 1.21 0.63 84.77 0.69
Table 3: Summary of the model fit for each method in the data analysis. The table shows the cross-validation RMSE from the five-fold cross validation. For BNMR and BISOREG the table additionally shows the comparison of the scaled derivative to the time-resolved measurement of PM from a TEOM. The results show the , intercept, and slope from the regression of the TEOM PM on the MARS estimated PM obtained from the estimated first derivative of pressure drop.

LOESS outperforms the other methods in terms of cross-validation RMSE on the pressure drop function for two reasons. First, LOESS does not impose monotonicity and several of the pressure drop measurements show minor deviations from the largely monotone trend. The small waves result from small fluctuations in the air temperature measured by the device, which lead to small fluctuations in air density and thus small fluctuations in the mass flow rate through the filter. The second reason that LOESS has lower cross-validation RMSE is that three of samples show sharp change-points in the pressure drop functions (similar to the “elbow” in simulation scenario 4) and LOESS is the only method that did not over-smooth these points (see supplemental Figure 9). UBP can also estimate the non-monotone trend but struggles with the “elbow.” However, the non-monotonicity in LOESS and UBP results in negative estimates of aerosol concentration, which are not physically possible.

The monotone methods smooth over the non-monotone areas of the data. This results in valid estimates of PM because the derivative is always non-negative. It is also consistent with the theoretical framework for measuring time-resolved PM from pressure drop using MARS as the pressure drop function should be monotone. However, this comes at a cost because the oscillation appears as autocorrelation in the residuals. This is not accounted for in the model as we assume independent and identically distributed residuals and could result in some bias in the intervals but results in a rational estimate of time-resolved PM.

4.3 Inference on time-resolved PM with the scaled derivative

Our primary interest is estimating PM concentration using the scaled first derivative of the estimated pressure drop function. We scale each derivative by the total mass collected on the filter as described in Section 2.4. Figure 16 shows the estimates for the scaled first derivative with both BNMR and BISOREG. For comparison, the PM concentration measured with the TEOM is included. Both BNMR and BISOREG estimate the larger pattern in PM concentration but do not fully capture localized features.

(a) BNMR
Figure 16: Estimated PM concentration from the MARS data. Panel (a)a shows the posterior mean and 95% interval from BNMR and Panel (b)b shows the posterior mean and 95% interval from BISOREG. The dashed line in each panel is the PM concentration measured with the TEOM. Results from all 12 runs are shown in the supplemental material. Results with other methods are also shown in the supplemental material.

To more formally compare the estimates of PM concentration we regressed the one minute TEOM measurements on the estimated concentrations at those same time points obtained using each method (Table 3). The mean across all 12 samples was 0.75 with BNMR and BISOREG. Hence, these two approaches provide similar estimates of real-time concentration.

Estimates of PM from the other methods (BCGAM, CGAM, LOESS, and UBP) are presented in the supplement. All of these methods are being used beyond their original intention and suffer from shortcomings when estimating the derivative of a function. BCGAM and CGAM use a quadratic spline and result in piecewise linear derivatives which are not suitable to estimate the time-resolved PM. LOESS and UBP are non-monotone and result in negative estimates of PM over some time segments. In addition, BCGAM, CGAM, and LOESS do not allow for the straight forward inference. When comparing to estimated PM from these methods to the measurements from the TEOM, LOESS was the best performing method with an of 0.81 despite having negative estimates of PM for a substantial period of time. The other approaches had higher ranging from 0.57 to 0.72.

4.4 Posterior visualization and MCMC performance

To better illustrate how BNMR works and compare the variable selection and clustering approach of BNMR to the variable selection only approach of BISOREG, we show the basis functions used in one sample in Figure 5. Panel (a)a shows the BP basis of order 50 as used in the simulation and data analysis. Panel (b)b shows the transformed BP basis as described in Section 2.1. We estimated the posterior mode subset of basis functions used with BNMR and BISOREG. Panel (c)c shows the posterior mode subset of basis functions included into the model with BISOREG. This is a subset of the full basis expansion shown in Panel (b)b. Panel (d)d shows the posterior mode combination of basis functions used by BNMR. This includes and intercept and three additional combination basis functions. Each combination basis function is a cluster of one to three of the original transformed basis function in Panel (b)b. BISOREG uses an intercept and nine additional basis functions. As a result only three unique non-zero slope parameters are estimated with BNMR compared to the nine used by BISOREG. The posterior mode basis functions are shown for all 12 samples in supplemental material. BNMR uses between three and five combination basis function at its posterior mode with each combination basis function being a cluster of one to five of the original basis functions.

Finally, we compare the MCMC performance of BNMR and BISOREG which both use the same BP basis but have different priors and MCMC approaches. Supplemental Table 5 shows the mean effective sample size and autocorrelation in the posterior sample of . BNMR had a larger average effective sample size than BISOREG (1164 verse 1066 from a posterior sample of 5000 after thinning by 10 from an original sample of 50000) and had lower autocorrelation at lag 1 (0.273 verse 0.375). In part, this efficiency gain can be attributed to the clustering which results in a smaller number of unique regression coefficients being sampled from a truncated multivariate normal distribution. However, there are numerous other differences in the priors and algorithms that likely also contribute to differences in efficiency.

5 Discussion

We propose BNMR to estimate a smooth monotone regression function. Our method is motivated by data generated from the MARS aerosol monitor. This affordable monitor measures the pressure drop across a filter. As particles accumulate on the filter the pressure drop increases. The time-resolved PM concentration is inferred from the first derivative of pressure drop scaled by the total mass collected on the filter. Hence, our objective is to estimate a smooth monotone function and make inference on the scaled derivative of that function.

Our proposed approach uses a BP expansion with a Dirichlet process prior that performs both variable selection and clustering on the regression coefficients for the basis expansion. This formulation enables flexible monotone regression while allowing the model to be null when there is no association between predictor and outcome and allowing the function to be linear when there is no evidence of nonlinearity. Further, we can make coherent, closed-form inference on not only the function being estimated but the derivatives of that function and the scaled derivative of the function.

Our simulation study showed that BNMR performs similarly to other approaches for smooth nonlinear functions but offers improved inference at smaller sample sizes and when the true function is linear. By both clustering and selecting basis functions, BNMR is self-tuning and results in a smaller parameter space than methods that use variable selection alone.

Our proposed method builds on a substantial body of research on statistical methods to measure or estimate exposure to PM, PM components, other environmental pollutants. This includes methods to infer exposures from existing monitoring networks, deployment of networks of portable devices, smartphones, and personal monitors (Calder, 2008; Rundel et al., 2015; Das and Ghosal, 2017; Huang et al., 2018; Finazzi and Paci, 2019).

Supplemental Material

The supplemental material includes replicates of Figures 5, 13, and 16 for all 12 runs. It also includes additional simulation results and information on computation time and efficiency. The methods can be implemented with the R package bnmr available at Data available on request from the authors.


This work was supported by the U.S. National Aeronautics and Space Administration and the Robert Wood Johnson Foundation through the Earth and Space Air Prize and by the U.S. Centers for Disease Control, National Institute for Occupational Safety and Health (OH010662 and OH011598).

This work utilized the RMACC Summit supercomputer, which is supported by the National Science Foundation (awards ACI-1532235 and ACI-1532236), the University of Colorado Boulder and Colorado State University. The RMACC Summit supercomputer is a joint effort of the University of Colorado Boulder and Colorado State University.


  • Blackwell and MacQueen (1973) Blackwell, D. and MacQueen, J. B. (1973). Ferguson Distributions Via Polya Urn Schemes Author. The Annals of Statistics, 1(2):353–355.
  • Brunk (1955) Brunk, H. D. (1955). Maximum Likelihood Estimates of Monotone Parameters. The Annals of Mathematical Statistics, 26(4):607–616.
  • Bush and MacEachern (1996) Bush, C. A. and MacEachern, S. N. (1996). A Semiparametric Bayesian Model for Randomised Block Designs. Biometrika, 83(2):275–285.
  • Calder (2008) Calder, C. A. (2008). A dynamic process convolution approach to modeling ambient particulate matter concentrations. Environmetrics, 19(1):39–48.
  • Canale et al. (2018) Canale, A., Durante, D., and Dunson, D. B. (2018). Convex mixture regression for quantitative risk assessment. Biometrics, 74(4):1331–1340.
  • Cassese et al. (2019) Cassese, A., Zhu, W., Guindani, M., and Vannucci, M. (2019). A Bayesian nonparametric spiked process prior for dynamic model selection. Bayesian Analysis, 14(2):553–572.
  • Chang et al. (2007) Chang, I.-S., Chien, L.-C., Hsiung, C. A., Wen, C.-C., and Wu, Y.-J. (2007). Shape restricted regression with random Bernstein polynomials. In Complex Datasets and Inverse Problems, volume 54 of Institute of Mathematical Statistics Lecture Notes - Monograph Series, pages 187–202. Institute of Mathematical Statistics, Beachwood, Ohio, USA.
  • Chang et al. (2005) Chang, I.-S., Hsiung, C. A., Wu, Y.-J., and Yang, C.-C. (2005). Bayesian Survival Analysis Using Bernstein Polynomials. Scandinavian Journal of Statistics, 32(3):447–466.
  • Crainiceanu et al. (2005) Crainiceanu, C., Ruppert, D., and Wand, M. P. (2005). Bayesian Analysis for Penalized Spline Regression Using WinBUGS. Journal of Statistical Software, 14(14).
  • Curtis (2018) Curtis, S. M. (2018). bisoreg: Bayesian Isotonic Regression with Bernstein Polynomials.
  • Curtis and Ghosh (2011) Curtis, S. M. and Ghosh, S. K. (2011). A variable selection approach to monotonic regression with Bernstein polynomials. Journal of Applied Statistics, 38(5):961–976.
  • Das and Ghosal (2017) Das, P. and Ghosal, S. (2017).

    Analyzing ozone concentration by Bayesian spatio-temporal quantile regression.

    Environmetrics, 28(4):e2443.
  • de Boor (1978) de Boor, C. (1978). A practical guide to splines. Springer-Verlag, New York.
  • Dias and Gamerman (2002) Dias, R. and Gamerman, D. (2002). A Bayesian Approach to Hybrid Splines Non-Parametric Regression. Journal of Statistical Computation and Simulation, 72(4):285–297.
  • Dias and Garcia (2007) Dias, R. and Garcia, N. L. (2007). Consistent estimator for basis selection based on a proxy of the Kullback–Leibler distance. Journal of Econometrics, 141(1):167–178.
  • Ding and Zhang (2016) Ding, J. and Zhang, Z. (2016). Bayesian regression on non-parametric mixed-effect models with shape-restricted Bernstein polynomials. Journal of Applied Statistics, 43(14):2524–2537.
  • Dobroski et al. (1997) Dobroski, H., Tuchman, D. P., and Vinson, R. P. (1997). Differential Pressure as a Means of Estimating Respirable Dust Mass on Collection Filters. Applied Occupational and Environmental Hygiene, 12(12):1047–1051.
  • Escobar and West (1995) Escobar, M. D. and West, M. (1995). Bayesian Density Estimation and Inference Using Mixtures. Journal of the American Statistical Association, 90(430):577–588.
  • Finazzi and Paci (2019) Finazzi, F. and Paci, L. (2019). Quantifying personal exposure to air pollution from smartphone-based location data. Biometrics, (July 2018):1356–1366.
  • Herring (2010) Herring, A. H. (2010). Nonparametric Bayes Shrinkage for Assessing Exposures to Mixtures Subject to Limits of Detection. Epidemiology, 21(Supplement):S71–S76.
  • Hildreth (1954) Hildreth, C. (1954). Point Estimates of Ordinates of Concave Functions. Journal of the American Statistical Association, 49(267):598–619.
  • Huang et al. (2018) Huang, G., Chen, L.-J., Hwang, W.-H., Tzeng, S., and Huang, H.-C. (2018).

    Real-time PM 2.5 mapping and anomaly detection from AirBoxes in Taiwan.

    Environmetrics, 29(8):e2537.
  • Li and Ghosh (2015) Li, Y. and Ghosh, S. K. (2015). Efficient Sampling Methods for Truncated Multivariate Normal and Student- t Distributions Subject to Linear Inequality Constraints. Journal of Statistical Theory and Practice, 9(4):712–732.
  • Liu et al. (2015) Liu, F., Wang, C., and Liu, P. (2015). A Semi-parametric Bayesian Approach for Differential Expression Analysis of RNA-seq Data. Journal of Agricultural, Biological, and Environmental Statistics, 20(4):555–576.
  • Mammen (1991) Mammen, E. (1991). Estimating a Smooth Monotone Regression Function. The Annals of Statistics, 19(2):724–740.
  • Mammen et al. (2001) Mammen, E., Marron, J. S., Turlach, B. A., and Wand, M. P. (2001). A General Projection Framework for Constrained Smoothing. Statistical Science, 16(3):232–248.
  • Mclain and Ghosh (2009) Mclain, A. C. and Ghosh, S. K. (2009). Estimation of Time Transformation Models with Bernstein Polynomials.
  • Meyer (2008) Meyer, M. C. (2008). Inference using shape-restricted regression splines. The Annals of Applied Statistics, 2(3):1013–1033.
  • Meyer (2012) Meyer, M. C. (2012). Constrained penalized splines. Canadian Journal of Statistics, 40(1):190–206.
  • Meyer (2013) Meyer, M. C. (2013). Semi-parametric additive constrained regression. Journal of Nonparametric Statistics, 25(3):715–730.
  • Meyer et al. (2011) Meyer, M. C., Hackstadt, A. J., and Hoeting, J. A. (2011). Bayesian estimation and inference for generalised partial linear models using shape-restricted splines. Journal of Nonparametric Statistics, 23(4):867–884.
  • Meyer and Liao (2018) Meyer, M. C. and Liao, X. (2018). cgam: Constrained Generalized Additive Model.
  • Neelon and Dunson (2004) Neelon, B. and Dunson, D. B. (2004). Bayesian isotonic regression and trend analysis. Biometrics, 60(2):398–406.
  • Novick et al. (1992) Novick, V., Monson, P., and Ellison, P. (1992). The effect of solid particle mass loading on the pressure drop of HEPA filters. Journal of Aerosol Science, 23(6):657–665.
  • Oliva-Aviles and Meyer (2018) Oliva-Aviles, C. and Meyer, M. C. (2018). bcgam: Bayesian Constrained Generalised Linear Models.
  • Powell et al. (2012) Powell, H., Lee, D., and Bowman, A. (2012). Estimating constrained concentration-response functions between air pollution and health. Environmetrics, 23(3):228–237.
  • Ramsay (1988) Ramsay, J. O. (1988). Monotone Regression Splines in Action. Statistical Science, 3(4):425–441.
  • Rundel et al. (2015) Rundel, C. W., Schliep, E. M., Gelfand, A. E., and Holland, D. M. (2015). A data fusion approach for spatial analysis of speciated PM 2.5 across time. Environmetrics, 26(8):515–525.
  • Spiegelhalter et al. (2002) Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B, 64(4):583–616.
  • Tryner et al. (2019) Tryner, J., Quinn, C., Windom, B. C., and Volckens, J. (2019). Design and evaluation of a portable PM 2.5 monitor featuring a low-cost sensor in line with an active filter sampler. Environmental Science: Processes & Impacts, 21(8):1403–1415.
  • Volckens et al. (2017) Volckens, J., Quinn, C., Leith, D., Mehaffy, J., Henry, C. S., and Miller-Lionberg, D. (2017). Development and evaluation of an ultrasonic personal aerosol sampler. Indoor Air, 27(2):409–416.
  • Wang and Ghosh (2011) Wang, J. and Ghosh, S. (2011). Shape Restricted Nonparametric Regression Based on Multivariate Bernstein Polynomials. Technical report, North Carolina State University Department of Statistics, Raleigh, NC.
  • Wang and Li (2008) Wang, X. and Li, F. (2008). Isotonic smoothing spline regression. Journal of Computational and Graphical Statistics, 17(1):21–37.
  • West et al. (1994) West, M., Müller, P. M., and Escobar, M. D. (1994). Hierarchical priors and mixture models, with application in regression and density estimation. In Freeman, P. and Smith, A. F., editors, Aspects of Uncertainty: A Tribute to D.V. Lindley, pages 363–386. Wiley Series in Probability and Statistics.
  • Wilson et al. (2014) Wilson, A., Rappold, A. G., Neas, L. M., and Reich, B. J. (2014). Modeling the effect of temperature on ozone-related mortality. The Annals of Applied Statistics, 8(3):1728–1749.