1 Introduction
In environmental, biomedical, and engineering applications a common objective is to estimate the relation between a predictor and an outcome when there is prior knowledge that the relation is monotone or otherwise shapeconstrained. In this paper we consider one such application that relates to measuring airborne particles at finetemporal resolution using a recentlydeveloped portable monitor. At the center of this problem is estimation of a function that is known to be monotone due to the physical processes involved in the monitor and making inference on the scaled first derivative of the estimated monotone function which is equal to estimated aerosol concentration.
Measuring air pollution with high temporal and spatial resolution is critical to both conducting air pollution research and protecting the public’s health. In an ideal world, we would be able to use a large number of monitors to measure personal air pollution exposure in cohort studies of health effects or to deploy in networks to warn of potential risks such as those from exposure to wildfire smoke. However, the large size and high cost of air quality monitors has historically prohibited widespread use. Hence, there is a need to develop smaller, more affordable monitors and the accompanying data science tools to make meaningful inference on the readouts of these monitors.
In this paper we consider inference for data generated by the recentlydeveloped Mobile Aerosol Reference Sampler (MARS). MARS was designed to be an affordable, portable monitor for measuring fine particulate matter () concentrations in environmental and occupational health studies (Tryner et al., 2019). The MARS device is built on the Ultrasonic Personal Aerosol Sampler (UPAS) platform which has also been previously described in the literature (Volckens et al., 2017). MARS uses a piezoelectric microblower to pull air through a PM cyclone inlet and a 25mm filter. A highresolution pressure sensor measures the timeresolved pressure drop across the sampling filter. As particles accumulate on the filter the pressure drop across the filter increases. This pressure drop should be positive and increase monotonically in time during measurement. Deviations from monotonicity only occur (1) in the first few minutes of use when a new filter is stretching out or (2) if there is a change in air density or particle source. In the experimental data used in this paper, particle source remained constant and only minor changes in air density occurred. Timeresolved PM concentration can be inferred from the timeresolved rate of change in pressure drop after the latter is normalized to the total PM mass collected on the filter. Specifically, when the derivative is scaled so that the area under the derivative function is equal to total PM mass collected on the filter divided by volumetric flow rate, then the scaled first derivative is a measure of PM concentration as a function of time (Novick et al., 1992; Dobroski et al., 1997). Hence, the objective is to estimate the pressure drop as a function of time and then make inference on the scaled first derivative of pressure drop.
Several approaches have been proposed to estimate monotone functions. Early works include estimation of shapeconstrained piecewise linear functions (Hildreth, 1954; Brunk, 1955). Mammen (1991) proposed monotone kernel smoother methods and Mammen et al. (2001) proposed monotone projections of unconstrained smooth estimators. A large number of spline based approaches have been proposed including cubic smoothing splines (Wang and Li, 2008), constrained regression splines (Ramsay, 1988; Meyer, 2008; Meyer et al., 2011; Powell et al., 2012), penalized splines (Meyer, 2012), and piecewise linear splines (Neelon and Dunson, 2004). Several recent papers have proposed monotone Bernstein polynomial (BP) regression (Chang et al., 2005, 2007; Curtis and Ghosh, 2011; Wang and Ghosh, 2011; Wilson et al., 2014; Ding and Zhang, 2016).
In this paper we take a BP approach to constrained regression. Monotonicity can be imposed with BPs by imposing a linear order constraint on the regression coefficients. An alternative but equivalent approach is to linearly transform the regression coefficients and then impose a positivity constraint on all of the transformed regression coefficients with the exception of the intercept, which is unconstrained
(Wang and Ghosh, 2011). Curtis and Ghosh (2011)proposed a variable selection approach to monotone regression with BPs that puts a variable selection prior on the transformed regression coefficients akin to a mixture of a mass point at 0 and a normal distribution truncated below at 0. The approach is appealing because it imposes monotonicity, allows for datadriven tuning of the model by selecting excess basis functions out of the model and allows for no association when all coefficients are selected out of the model.
The approach we present here, which we refer to as Bayesian nonparametric monotone regression (BNMR), is similar to that of Curtis and Ghosh (2011)
in that we use a BP expansion and a variable selection prior that imposes monotonicity. In contrast, our approach both selects some regression coefficients to be zero and clusters other regression coefficients. By clustering regression coefficients we create a reduced set of combination basis functions that are each the sum of multiple BPs and assigned a single regression coefficient. This has two distinct advantages over only variable selection. First, when all regression coefficients are clustered together into a single combination basis function the approach is equivalent to performing linear regression with the slope constrained to be nonnegative. This improves performance when the true regression function is in fact linear. Second, when the true regression function is nonlinear our approach requires a reduced number nonzero regression coefficients each corresponding to the combination of a mutually exclusive set of basis functions. In a simulation study we show that our approach is able to match the flexibility of alternative approaches but uses a smaller number of parameters. As a result our Markov chain Monte Carlo (MCMC) approach samples from the full conditional of a truncated multivariate normal distribution of smaller dimension which can reduce autocorrelation in the resulting chain. Hence, the proposed method allows for flexible monotone regression while allowing the model to be null when there is no association between predictor and outcome and allowing the function to be linear when there is no evidence of nonlinearity. This results in comparable performance to other approaches for smooth nonlinear functions but improved inference when the true relation is linear.
We apply the proposed approach to evaluate 12 samples collected using MARS in a controlled laboratory chamber. We compare estimated timeresolved PM inferred with the proposed method based on 30second measurements of pressure drop across the MARS filter to minuteresolution measurements of PM in the chamber reported by a tapered element oscillating microbalance (TEOM) (1405 TEOM, ThermoFisher Scientific, Waltham, MA, USA), which is a regulatorygrade PM monitor.
2 Methods
2.1 Model formulation
Our primary interest is estimating the regression function
(1) 
where is an unknown monotone function. Without loss of generality, and consistent with our application, we assume that is monotone increasing. We also assume that is scaled to the unit interval.
We parameterized using a BP expansion. The BP basis function of order is
(2) 
The regression function expressed as a weighted combination of BPs is
(3) 
where are regression coefficients and . The first regression coefficient parameterizes the intercept. Figure (a)a shows the BP basis used in the data analysis.
The regression function in (3) is monotone increasing if for all . Following Curtis and Ghosh (2011), it is convenient to reparameterize the regression coefficients. Let where and the matrix is such that and for :
(4) 
The regression function is then
(5) 
Figure (b)b shows the transformed basis used in the data analysis.
Using this reparameterization is monotone increasing when for all . Further, is linear with the form then including no association when .
We assign a prior to , , that is a finite mixture of a mass point at zero denoted by the Dirac measure and a distribution
with positive support. This approach selects some regression coefficients to be 0, effectively removing those basis functions from the model. In the nonzero probability event that all regression coefficients are zero there is no association between
and . We then let the positive distribution be a Dirichlet process (DP) with base measure , where implies a truncated normal with support , mean, and variance
. By using a base measure with support over we ensure that the nonzero regression coefficients are positive. This imposes monotonicity of . Further, the clustering property of the DP allows for all regression coefficients to be equal, in the same cluster, allowing for positive probability that is linear. The selection and clustering of the regression coefficients does not, however, impact smoothness. The estimated function is guaranteed to be smooth and differentiable.The full model is
(6)  
The above model is equivalent to a DP with base measure that is a finite mixture
(7)  
Several papers have used similar DP constructions that combine a DP with a finite mixture of a mass point and a nontruncated normal distribution (Herring, 2010; Canale et al., 2018; Cassese et al., 2019)
or a gamma distribution
(Liu et al., 2015).We complete the specification by assigning the prior , a normal mean zero variance prior to the intercept , and .
2.2 Posterior computation
The model in (6) can be efficiently sampled with a Gibbs sampler. This is accomplished by first integrating out from the model. The Gaussian likelihood and truncated normal base measure allows for to be marginalized out of the model as well. The posterior can be simulated using a Polya Urn scheme (Blackwell and MacQueen, 1973; West et al., 1994; Bush and MacEachern, 1996).
Let be the transformed BP basis expansion for observation and be the design matrix with row equal to . Let
denote the vector
with the element omitted, the vector with the element omitted, and denote only the element of . Similarly, let be the matrix with the column omitted and be only the column of . Finally, we denote by the categorical indicator where if and the number of coefficients in cluster where is the number in the null cluster with .The full conditional for , , is categorical. The conditional probability that the regression coefficient is equal to 0 is
(8) 
where is the number of regression coefficients in cluster , the null cluster, excluding , is a normalizing constant, and denotes a normal density function. In contrast to standard DP models, the zero cluster is allowed to be empty in this model. The conditional probability that is allocated to an existing nonzero cluster is
Finally, the conditional probability that is allocated to a new cluster is
where and .
In the situation where is assigned to new cluster a value for can be sampled from its univariate truncated normal full conditional. The full conditional for a single regression coefficient where (no other coefficient takes that value) is truncated above 0 and has mean and variance as specified above. We use the hybrid univariate truncated normal sampler of Li and Ghosh (2015) to sample from this full conditional.
The vector contains three types of elements: the unconstrained intercept, parameters that are selected to be 0, and parameters that are nonzero and are constrained to be greater than 0. The nonzero values take on unique values where is the unconstrained intercept. Using this notation the linear predictor where is a transformation matrix that maps to according to . The vector has a truncated multivariate normal full conditional with mean and variance where is is a diagonal matrix with in the first diagonal location for the intercept and in all other diagonal locations for the constrained coefficients. These are the same as the typical mean and variance for a normalnormal model full conditional. The first element is not truncated and the remaining elements are truncated below at 0. We simulate from the full conditional of as a multivariate block using the hybrid multivariate sampler approach of Li and Ghosh (2015).
The Gibbs sampler is completed with standard updates of using a mixture of gammas (Escobar and West, 1995) and using the standard gamma full conditional.
2.3 Details on tuning
Care must be given when specifying the prior, particularly for the choice of values for the mean and standard deviation of the base measure
and . This is challenging because the plausible values for the regression coefficients depends on the number of nonzero regression coefficients in the model and how many basis functions each coefficient is applied to (cluster size). We do not know either of these quantities a priori. We have taken the approach of scaling the outcome to have mean zero and variance one and then setting and . This puts reasonable mass on values between zero and one which represents plausible values for a variety of basis configurations. We have found that this choice performs well across a variety of simulated and real datasets. We use this setting in all simulation and data analysis results presented in this paper. However, results can be sensitive to this choice. Supplemental Section 1.2 includes an additional simulation study that compares sensitivity to different values of and . We show that asincreases the posterior probability of no association decreases and the number of clusters (unique nonzero regression coefficients) increases. However, the model fit as measured by RMSE on
and the derivative of is less sensitive to this choice.The user must also specify the order of the BP (). This should be selected, in theory, based on sample size and the differentiability of the function being estimated (Mclain and Ghosh, 2009). In practice, methods such as reversible jump MCMC or KullbackLeibler distance have been used to attempt to estimate the dimension of basis expansions to be used in nonparametric regression (e.g. Dias and Gamerman, 2002; Dias and Garcia, 2007; Meyer et al., 2011) while penalization can be used to regularize a rich basis to avoid over fitting (Crainiceanu et al., 2005). It has additionally been noted that shape constraints, including monotonicity, reduce sensitivity to the dimension of the basis expansion (Meyer, 2008). We follow the approach of Curtis and Ghosh (2011) and use a rich basis and let the prior select or cluster redundant predictors. In this paper, we use in all results shown in the main text but show in the supplement that a smaller value of results in lower RMSE when the true function is closer to linear and a higher value of is preferred when the true function is more wiggily. If the practitioner has prior knowledge of the shape of the underlying function, beyond monotonicity, this could be incorporated into the selection of a priori.
2.4 Inference on the derivative and aerosol concentration
The proposed approach allows for coherent estimation and inference on not only the function but on the derivatives of . This includes full quantification of the uncertainty in the derivatives and guaranteed smoothness in the derivatives. This is particularly critical in our application where the first derivative of is proportional to the timeresolved aerosol concentration. For a BP of order the first derivative is a BP of order . Specifically the first derivative is
(11) 
For the derivative the regression coefficient , which corresponds to the intercept, is not included. Hence, the derivative can be identified in closed form from the posterior sample of . Inference on the derivative can be made directly by using the posterior sample of .
From a theoretical perspective the total aerosol mass accumulated on the filter should be the flow rate through the filter times the concentration integrated over time. Here, flow rate is constant and therefore . In our model . We therefore scale the derivative to the total filter mass by replacing in (11) with . We then estimate aerosol concentration as
(12) 
In practice, we scale each draw of from the posterior and then construct a posterior sample of .
2.5 Alternative spline approach
The proposed prior can be applied to other basis expansions and achieve some, but not all, of the same properties. Using the same prior structure with a transformed spline or splines without the transformation matrix can still achieve monotonicity (de Boor, 1978; Ramsay, 1988). The proposed prior will also allow estimation of no association when all regression coefficients are clustered at zero. However, the clustering will not result in shrinkage toward a linear response.
Using a spline basis with compact support may result in more flexibility than the BP approach presented here. This could be particularly appealing when the function being estimated has sharp change points. In addition, the derivative of many common splines including splines and splines can be represented as a spline itself and inference on the derivative can be made using a similar approach. However, splines lose flexibility and smoothness in the derivative. For example, the standard cubic spline has a quadratic derivative while a quadratic spline has a piecewise linear derivative. This may not be sufficiently flexible in many cases, as seen in the data analysis in Section 4. In contrast, the BP uses a higher order polynomial and therefore has a higher order derivative which imposes smoothness not only in the function being estimated but in all derivatives of that function.
3 Simulation
We compare the proposed approach, BNMR, to alternative methods for monotone regression in a simulation study. We generated 500 datasets from four designs each taking the form for with . We generate and consider four shapes for the function :

Flat: .

Linear: .

Wavy: .

Flatnonlinear: for and for .
The flat, linear, and wavy functions mirror those from Curtis and Ghosh (2011). We simulated data sets of size and .
We compared BNMR to alternative monotone regression methods that have available R software that includes variance estimates. The comparison methods are: constrained generalized additive models (CGAM, Meyer, 2013; Meyer and Liao, 2018), Bayesian constrained generalized additive models (BCGAM, Meyer et al., 2011; OlivaAviles and Meyer, 2018), and Bayesian isotonic regression (BISOREG, Curtis and Ghosh, 2011; Curtis, 2018)
. In addition we compare with the unconstrained methods ordinary least squares (OLS), local polynomial regression (LOESS), and an unconstrained Bernstein polynomial model (UBP). For BNMR and BISOREG we set
and consider other values in the Supplemental Material. For UBP we select using deviance information criterion (DIC, Spiegelhalter et al., 2002).We evaluate the model performance by the root mean squared error (RMSE) on the function and the pointwise 95% interval coverage both evaluated at 100 evenly spaced points spanning the range of . For the Bayesian methods (BNMR, BISOREG, BCGAM, and UBP) we consider the posterior probability that is linear and that is flat (no association). For the CGAM and OLS we report the mean
value for testing the null hypothesis that there is no association.
Table 1 shows results from the simulation study. At , BNMR had the lowest RMSE on
among all the monotone regression methods on all four scenarios (within one standard error with BCGAM and BISOREG on the flat scenario). The only method to have lower RMSE was OLS on the linear scenario. BNMR, BISOREG, CGAM, and UBP all had pointwise 95% interval coverage between 0.95 and 0.98 on all scenarios. Each of the other methods had interval coverage of 0.9 or less on at least one scenario. At
, BNMR had the lowest RMSE for the flat and flatnonlinear scenarios (within one standard error with BCGAM and BISOREG on the flat scenario). CGAM and BCGAM had the lowest RMSE of the constrained methods on the wavy scenario with BNMR and BISOREG slightly higher. OLS and UBP had the lowest RMSE on the linear scenario. UBP selected , a cubic regression function, in almost all datasets for the linear scenario.Model  RMSE  Coverage  Pr( flat )  RMSE  Coverage  Pr( flat ) 

Scenario 1: Flat  
BCGAM  2.22  0.95  0.00  0.65  0.96  0.00 
BISOREG  2.19  0.97  0.86  0.61  0.97  0.95 
BNMR  2.08  0.96  0.94  0.60  0.96  0.99 
CGAM  3.57  0.98  0.49  1.17  0.99  0.50 
LOESS  5.15  0.93  NA  1.61  0.95  NA 
OLS  3.11  0.95  0.50  0.93  0.96  0.53 
UBP  5.23  0.93  0.00  1.60  0.95  0.00 
Scenario 2: Linear  
BCGAM  6.42  0.84  0.00  2.58  0.84  0.00 
BISOREG  5.99  0.96  0.00  2.42  0.95  0.00 
BNMR  4.72  0.98  0.00  2.14  0.96  0.00 
CGAM  5.79  0.96  0.00  2.30  0.95  0.00 
LOESS  5.15  0.93  NA  1.61  0.95  NA 
OLS  3.11  0.95  0.00  0.93  0.96  0.00 
UBP  5.28  0.93  0.00  1.60  0.95  0.00 
Scenario 3: Wavy  
BCGAM  6.57  0.84  0.00  2.17  0.88  0.00 
BISOREG  5.98  0.95  0.00  2.25  0.95  0.00 
BNMR  5.30  0.96  0.00  2.25  0.94  0.00 
CGAM  5.67  0.96  0.00  2.15  0.96  0.00 
LOESS  6.44  0.89  NA  2.14  0.93  NA 
OLS  8.02  0.56  0.00  7.22  0.19  0.00 
UBP  6.38  0.90  0.00  2.30  0.90  0.00 
Scenario 4: Flatnonlinear  
BCGAM  5.33  0.90  0.00  1.93  0.89  0.00 
BISOREG  5.60  0.95  0.00  2.12  0.96  0.00 
BNMR  4.95  0.96  0.00  1.82  0.96  0.00 
CGAM  5.29  0.96  0.00  1.91  0.97  0.00 
LOESS  5.70  0.91  NA  1.88  0.93  NA 
OLS  16.11  0.32  0.00  16.24  0.09  0.00 
UBP  5.42  0.93  0.00  1.91  0.91  0.00 
Both BNMR and BISREG had high, greater than 0.86, posterior probabilities of a flat response (no association) in the flat scenario. BCGAM does not include a flat response in the parameter space and therefore has a posterior, and prior, probability of 0. The average
values for the test of no association for CGAM and OLS were between 0.49 and 0.53.BNMR is the only method that allows the estimated function to be linear with slope greater than 0. However, in the linear scenario this did not occur. The mean posterior probability of a linear function was 0.00. The response is linear when all regression coefficients are nonzero and take the same value. Figure 6 shows the number of nonzero regression coefficient in the model and the number of unique values those regression coefficients take. Both BNMR and BISOREG include only a small number of nonzero regression coefficients, effectively selecting out of the model the majority of the basis functions. Because not all basis functions are included the estimated regression function is never truly linear. Despite not being exactly linear, BNMR has lower RMSE than any of the other nonparametric methods on the linear scenario.
A key difference between BNMR and BISOREG is that all nonzero regression coefficients in BISOREG take unique values while with BNMR the nonzero regression coefficients are clustered together and take fewer unique values. On average, there were more nonzero regression coefficients included into the model with BNMR but fewer, less than two, unique regression coefficients. This is true for both the linear and nonlinear scenarios and for BP expansions of order ranging from 20 to 100 (shown in supplemental material). As a result, the proposed approach requires estimating only a small number of unique regression coefficients regardless of the size of the basis expansion or the wiggliness of the regression function.
In our application we are interested in the derivative of the monotone function. The BP basis used by BISOREG, BNMR and UBP allows straight forward inference on the derivatives of . The other methods do not naturally allow for this inference. We calculate a pointwise approximation of the derivative for the other methods by calculating change in divided by change in for each pair of neighboring observations on an equally spaced grid. We do not evaluate coverage for these methods. Table 2 shows the RMSE and 95% interval coverage (for BISOREG, BNMR and UBP only) for the derivative of . BNMR had lowest RMSE for all scenarios at the smaller sample size and the flat scenario at the larger sample size. BNMR, BISOREG, and UBP all suffered from poor interval coverage in several of the scenarios. The coverage is pointwise and in the flatnonlinear scenario, which has an sharp “elbow” changepoint, both methods fail to cover in the elbow, highlighting a limitation of the smooth BP basis.
Model  RMSE  Coverage  RMSE  Coverage 

Scenario 1: Flat  
BCGAM  3.24  NA  0.85  NA 
BISOREG  4.90  0.00  0.61  0.00 
BNMR  1.12  1.00  0.19  1.00 
CGAM  22.80  NA  9.92  NA 
LOESS  53.81  NA  18.49  NA 
UBP  55.26  0.92  16.61  0.93 
Scenario 2: Linear  
BCGAM  61.68  NA  39.98  NA 
BISOREG  65.54  0.96  44.91  0.94 
BNMR  39.57  1.00  38.21  0.95 
CGAM  68.85  NA  40.79  NA 
LOESS  53.81  NA  18.49  NA 
UBP  56.90  0.91  16.60  0.93 
Scenario 3: Wavy  
BCGAM  64.03  NA  32.05  NA 
BISOREG  70.78  0.95  46.53  0.91 
BNMR  55.00  0.97  45.13  0.85 
CGAM  69.21  NA  37.04  NA 
LOESS  87.54  NA  35.18  NA 
UBP  97.38  0.78  52.52  0.68 
Scenario 4: Flatnonlinear  
BCGAM  62.08  NA  28.82  NA 
BISOREG  92.18  0.46  65.44  0.45 
BNMR  61.45  0.68  46.91  0.61 
CGAM  66.95  NA  34.57  NA 
LOESS  65.11  NA  28.99  NA 
UBP  62.59  0.88  28.39  0.66 
The supplemental material includes additional simulation results, including standard errors for the estimates in Tables 1 and 2, interval widths, and results on sensitivity to the choice of prior and as well as the order of the BP . In addition we show results for computation time as a function of sample size and order of the BP.
4 Analysis of RealTime PM Concentration Inferred from Pressure Drop
4.1 Overview of the data analysis
We use data from 12 samples collected using three MARS devices during four laboratory experiments. These experiments are described in detail by Tryner et al. (2019). During each experiment, one of four different types of aerosol—urban PM (NIST SRM 1648A Urban PM), ammonium sulfate ((NH)SO), Arizona road dust, or match smoke—is nebulized into a controlled chamber containing all three MARS. Each MARS samples PM onto a new polytetrafluoroethylene (PTFE) filter at a flow rate of 1 L min for between 7.5 and 13 hours while pressure drop across the filter is recorded every 30 seconds. Each filter is weighed before and after the experiment to measure the total mass of PM accumulated. A TEOM measures the PM concentration in the chamber every minute as a previouslyvalidated point of comparison.
We use BNMR to estimate timeresolved PM concentration using the MARS pressure drop data from the 12 samples. Prior to analysis we removed the first 30 minutes of pressure drop as: 1) there was no PM in the chamber at that time and 2) the new filter was stretching during that time and a decreasing trend is observed due to the stretching process. We also removed the final five minutes when there was 1) no PM in the chamber and 2) the sampler was shutting down resulting in spurious noise in the pressure drop function. Then, we fit BNMR to the timeseries of measured pressure drop for each sample. From the fitted model we then estimate the scaled first derivative of pressure drop at each timepoint for which the TEOM recorded PM as described in Section 2.4. For comparison, we perform the same procedure with BISOREG and UBP. We also estimate pressure drop and the scaled pointwise approximation of the derivative with LOESS, CGAM, and BCGAM. We omit OLS because of obvious nonlinearities in the pressure drop data.
For each method we visualize and compare the performance with respect to estimating the pressure drop function and inferring realtime PM from the scaled derivative of pressure drop. We show results from one of the 12 samples in the main text. The supplemental material includes estimated pressure drop and estimated realtime PM concentration for all 12 samples.
4.2 Estimation of the pressure drop function
Figure 13
shows the data and estimates from all six methods for a single sample. Each panel show estimates from a single method along with 0.95 confidence or credible intervals. The fits are near identical visually over most of the range. However, there are differences in the lower tail. BNMR and BISOREG tend to leveloff between 0 and 100 minutes. In contrast CGAM and especially BCGAM tend to oversmooth over the same time period.
Comparing UBP to BNMR and BISOREG, which all use a BP basis, highlights an important difference between the constrained methods and the unconstrained method. Specifically, UBP experiences instability in the tails, while BNMR and BISOREG which impose monotonicity and further regulate with a selection prior (BISOREG) or a selection and clustering prior (BNMR), are more stable in the tails.
To formally compare model fit for the pressure drop function we performed fivefold crossvalidation for each sample. Table 3 shows crossvalidation results for all five methods across all 12 samples. LOESS had the lowest crossvalidation RMSE at 0.81 followed by UBP at 1.21. Hence, the unconstrained methods provided a better fit then any of the monotone methods. The best performing monotone methods were BISOREG at 1.27 and BNMR 1.29. CGAM and BCGAM had higher RMSEs of 1.47 and 1.96, respectively.
CrossValidation  Regression  

Model  RMSE  Intercept  Slope  
BCGAM  1.96  0.57  37.27  0.97 
BISOREG  1.27  0.75  46.89  0.91 
BNMR  1.29  0.75  44.59  0.92 
CGAM  1.47  0.72  2.03  1.09 
LOESS  0.81  0.81  51.60  0.88 
UBP  1.21  0.63  84.77  0.69 
LOESS outperforms the other methods in terms of crossvalidation RMSE on the pressure drop function for two reasons. First, LOESS does not impose monotonicity and several of the pressure drop measurements show minor deviations from the largely monotone trend. The small waves result from small fluctuations in the air temperature measured by the device, which lead to small fluctuations in air density and thus small fluctuations in the mass flow rate through the filter. The second reason that LOESS has lower crossvalidation RMSE is that three of samples show sharp changepoints in the pressure drop functions (similar to the “elbow” in simulation scenario 4) and LOESS is the only method that did not oversmooth these points (see supplemental Figure 9). UBP can also estimate the nonmonotone trend but struggles with the “elbow.” However, the nonmonotonicity in LOESS and UBP results in negative estimates of aerosol concentration, which are not physically possible.
The monotone methods smooth over the nonmonotone areas of the data. This results in valid estimates of PM because the derivative is always nonnegative. It is also consistent with the theoretical framework for measuring timeresolved PM from pressure drop using MARS as the pressure drop function should be monotone. However, this comes at a cost because the oscillation appears as autocorrelation in the residuals. This is not accounted for in the model as we assume independent and identically distributed residuals and could result in some bias in the intervals but results in a rational estimate of timeresolved PM.
4.3 Inference on timeresolved PM with the scaled derivative
Our primary interest is estimating PM concentration using the scaled first derivative of the estimated pressure drop function. We scale each derivative by the total mass collected on the filter as described in Section 2.4. Figure 16 shows the estimates for the scaled first derivative with both BNMR and BISOREG. For comparison, the PM concentration measured with the TEOM is included. Both BNMR and BISOREG estimate the larger pattern in PM concentration but do not fully capture localized features.
To more formally compare the estimates of PM concentration we regressed the one minute TEOM measurements on the estimated concentrations at those same time points obtained using each method (Table 3). The mean across all 12 samples was 0.75 with BNMR and BISOREG. Hence, these two approaches provide similar estimates of realtime concentration.
Estimates of PM from the other methods (BCGAM, CGAM, LOESS, and UBP) are presented in the supplement. All of these methods are being used beyond their original intention and suffer from shortcomings when estimating the derivative of a function. BCGAM and CGAM use a quadratic spline and result in piecewise linear derivatives which are not suitable to estimate the timeresolved PM. LOESS and UBP are nonmonotone and result in negative estimates of PM over some time segments. In addition, BCGAM, CGAM, and LOESS do not allow for the straight forward inference. When comparing to estimated PM from these methods to the measurements from the TEOM, LOESS was the best performing method with an of 0.81 despite having negative estimates of PM for a substantial period of time. The other approaches had higher ranging from 0.57 to 0.72.
4.4 Posterior visualization and MCMC performance
To better illustrate how BNMR works and compare the variable selection and clustering approach of BNMR to the variable selection only approach of BISOREG, we show the basis functions used in one sample in Figure 5. Panel (a)a shows the BP basis of order 50 as used in the simulation and data analysis. Panel (b)b shows the transformed BP basis as described in Section 2.1. We estimated the posterior mode subset of basis functions used with BNMR and BISOREG. Panel (c)c shows the posterior mode subset of basis functions included into the model with BISOREG. This is a subset of the full basis expansion shown in Panel (b)b. Panel (d)d shows the posterior mode combination of basis functions used by BNMR. This includes and intercept and three additional combination basis functions. Each combination basis function is a cluster of one to three of the original transformed basis function in Panel (b)b. BISOREG uses an intercept and nine additional basis functions. As a result only three unique nonzero slope parameters are estimated with BNMR compared to the nine used by BISOREG. The posterior mode basis functions are shown for all 12 samples in supplemental material. BNMR uses between three and five combination basis function at its posterior mode with each combination basis function being a cluster of one to five of the original basis functions.
Finally, we compare the MCMC performance of BNMR and BISOREG which both use the same BP basis but have different priors and MCMC approaches. Supplemental Table 5 shows the mean effective sample size and autocorrelation in the posterior sample of . BNMR had a larger average effective sample size than BISOREG (1164 verse 1066 from a posterior sample of 5000 after thinning by 10 from an original sample of 50000) and had lower autocorrelation at lag 1 (0.273 verse 0.375). In part, this efficiency gain can be attributed to the clustering which results in a smaller number of unique regression coefficients being sampled from a truncated multivariate normal distribution. However, there are numerous other differences in the priors and algorithms that likely also contribute to differences in efficiency.
5 Discussion
We propose BNMR to estimate a smooth monotone regression function. Our method is motivated by data generated from the MARS aerosol monitor. This affordable monitor measures the pressure drop across a filter. As particles accumulate on the filter the pressure drop increases. The timeresolved PM concentration is inferred from the first derivative of pressure drop scaled by the total mass collected on the filter. Hence, our objective is to estimate a smooth monotone function and make inference on the scaled derivative of that function.
Our proposed approach uses a BP expansion with a Dirichlet process prior that performs both variable selection and clustering on the regression coefficients for the basis expansion. This formulation enables flexible monotone regression while allowing the model to be null when there is no association between predictor and outcome and allowing the function to be linear when there is no evidence of nonlinearity. Further, we can make coherent, closedform inference on not only the function being estimated but the derivatives of that function and the scaled derivative of the function.
Our simulation study showed that BNMR performs similarly to other approaches for smooth nonlinear functions but offers improved inference at smaller sample sizes and when the true function is linear. By both clustering and selecting basis functions, BNMR is selftuning and results in a smaller parameter space than methods that use variable selection alone.
Our proposed method builds on a substantial body of research on statistical methods to measure or estimate exposure to PM, PM components, other environmental pollutants. This includes methods to infer exposures from existing monitoring networks, deployment of networks of portable devices, smartphones, and personal monitors (Calder, 2008; Rundel et al., 2015; Das and Ghosal, 2017; Huang et al., 2018; Finazzi and Paci, 2019).
Supplemental Material
The supplemental material includes replicates of Figures 5, 13, and 16 for all 12 runs. It also includes additional simulation results and information on computation time and efficiency. The methods can be implemented with the R package bnmr available at github.com/AnderWilson/bnmr. Data available on request from the authors.
Acknowledgement
This work was supported by the U.S. National Aeronautics and Space Administration and the Robert Wood Johnson Foundation through the Earth and Space Air Prize and by the U.S. Centers for Disease Control, National Institute for Occupational Safety and Health (OH010662 and OH011598).
This work utilized the RMACC Summit supercomputer, which is supported by the National Science Foundation (awards ACI1532235 and ACI1532236), the University of Colorado Boulder and Colorado State University. The RMACC Summit supercomputer is a joint effort of the University of Colorado Boulder and Colorado State University.
References
 Blackwell and MacQueen (1973) Blackwell, D. and MacQueen, J. B. (1973). Ferguson Distributions Via Polya Urn Schemes Author. The Annals of Statistics, 1(2):353–355.
 Brunk (1955) Brunk, H. D. (1955). Maximum Likelihood Estimates of Monotone Parameters. The Annals of Mathematical Statistics, 26(4):607–616.
 Bush and MacEachern (1996) Bush, C. A. and MacEachern, S. N. (1996). A Semiparametric Bayesian Model for Randomised Block Designs. Biometrika, 83(2):275–285.
 Calder (2008) Calder, C. A. (2008). A dynamic process convolution approach to modeling ambient particulate matter concentrations. Environmetrics, 19(1):39–48.
 Canale et al. (2018) Canale, A., Durante, D., and Dunson, D. B. (2018). Convex mixture regression for quantitative risk assessment. Biometrics, 74(4):1331–1340.
 Cassese et al. (2019) Cassese, A., Zhu, W., Guindani, M., and Vannucci, M. (2019). A Bayesian nonparametric spiked process prior for dynamic model selection. Bayesian Analysis, 14(2):553–572.
 Chang et al. (2007) Chang, I.S., Chien, L.C., Hsiung, C. A., Wen, C.C., and Wu, Y.J. (2007). Shape restricted regression with random Bernstein polynomials. In Complex Datasets and Inverse Problems, volume 54 of Institute of Mathematical Statistics Lecture Notes  Monograph Series, pages 187–202. Institute of Mathematical Statistics, Beachwood, Ohio, USA.
 Chang et al. (2005) Chang, I.S., Hsiung, C. A., Wu, Y.J., and Yang, C.C. (2005). Bayesian Survival Analysis Using Bernstein Polynomials. Scandinavian Journal of Statistics, 32(3):447–466.
 Crainiceanu et al. (2005) Crainiceanu, C., Ruppert, D., and Wand, M. P. (2005). Bayesian Analysis for Penalized Spline Regression Using WinBUGS. Journal of Statistical Software, 14(14).
 Curtis (2018) Curtis, S. M. (2018). bisoreg: Bayesian Isotonic Regression with Bernstein Polynomials.
 Curtis and Ghosh (2011) Curtis, S. M. and Ghosh, S. K. (2011). A variable selection approach to monotonic regression with Bernstein polynomials. Journal of Applied Statistics, 38(5):961–976.

Das and Ghosal (2017)
Das, P. and Ghosal, S. (2017).
Analyzing ozone concentration by Bayesian spatiotemporal quantile regression.
Environmetrics, 28(4):e2443.  de Boor (1978) de Boor, C. (1978). A practical guide to splines. SpringerVerlag, New York.
 Dias and Gamerman (2002) Dias, R. and Gamerman, D. (2002). A Bayesian Approach to Hybrid Splines NonParametric Regression. Journal of Statistical Computation and Simulation, 72(4):285–297.
 Dias and Garcia (2007) Dias, R. and Garcia, N. L. (2007). Consistent estimator for basis selection based on a proxy of the Kullback–Leibler distance. Journal of Econometrics, 141(1):167–178.
 Ding and Zhang (2016) Ding, J. and Zhang, Z. (2016). Bayesian regression on nonparametric mixedeffect models with shaperestricted Bernstein polynomials. Journal of Applied Statistics, 43(14):2524–2537.
 Dobroski et al. (1997) Dobroski, H., Tuchman, D. P., and Vinson, R. P. (1997). Differential Pressure as a Means of Estimating Respirable Dust Mass on Collection Filters. Applied Occupational and Environmental Hygiene, 12(12):1047–1051.
 Escobar and West (1995) Escobar, M. D. and West, M. (1995). Bayesian Density Estimation and Inference Using Mixtures. Journal of the American Statistical Association, 90(430):577–588.
 Finazzi and Paci (2019) Finazzi, F. and Paci, L. (2019). Quantifying personal exposure to air pollution from smartphonebased location data. Biometrics, (July 2018):1356–1366.
 Herring (2010) Herring, A. H. (2010). Nonparametric Bayes Shrinkage for Assessing Exposures to Mixtures Subject to Limits of Detection. Epidemiology, 21(Supplement):S71–S76.
 Hildreth (1954) Hildreth, C. (1954). Point Estimates of Ordinates of Concave Functions. Journal of the American Statistical Association, 49(267):598–619.

Huang et al. (2018)
Huang, G., Chen, L.J., Hwang, W.H., Tzeng, S., and Huang, H.C. (2018).
Realtime PM 2.5 mapping and anomaly detection from AirBoxes in Taiwan.
Environmetrics, 29(8):e2537.  Li and Ghosh (2015) Li, Y. and Ghosh, S. K. (2015). Efficient Sampling Methods for Truncated Multivariate Normal and Student t Distributions Subject to Linear Inequality Constraints. Journal of Statistical Theory and Practice, 9(4):712–732.
 Liu et al. (2015) Liu, F., Wang, C., and Liu, P. (2015). A Semiparametric Bayesian Approach for Differential Expression Analysis of RNAseq Data. Journal of Agricultural, Biological, and Environmental Statistics, 20(4):555–576.
 Mammen (1991) Mammen, E. (1991). Estimating a Smooth Monotone Regression Function. The Annals of Statistics, 19(2):724–740.
 Mammen et al. (2001) Mammen, E., Marron, J. S., Turlach, B. A., and Wand, M. P. (2001). A General Projection Framework for Constrained Smoothing. Statistical Science, 16(3):232–248.
 Mclain and Ghosh (2009) Mclain, A. C. and Ghosh, S. K. (2009). Estimation of Time Transformation Models with Bernstein Polynomials.
 Meyer (2008) Meyer, M. C. (2008). Inference using shaperestricted regression splines. The Annals of Applied Statistics, 2(3):1013–1033.
 Meyer (2012) Meyer, M. C. (2012). Constrained penalized splines. Canadian Journal of Statistics, 40(1):190–206.
 Meyer (2013) Meyer, M. C. (2013). Semiparametric additive constrained regression. Journal of Nonparametric Statistics, 25(3):715–730.
 Meyer et al. (2011) Meyer, M. C., Hackstadt, A. J., and Hoeting, J. A. (2011). Bayesian estimation and inference for generalised partial linear models using shaperestricted splines. Journal of Nonparametric Statistics, 23(4):867–884.
 Meyer and Liao (2018) Meyer, M. C. and Liao, X. (2018). cgam: Constrained Generalized Additive Model.
 Neelon and Dunson (2004) Neelon, B. and Dunson, D. B. (2004). Bayesian isotonic regression and trend analysis. Biometrics, 60(2):398–406.
 Novick et al. (1992) Novick, V., Monson, P., and Ellison, P. (1992). The effect of solid particle mass loading on the pressure drop of HEPA filters. Journal of Aerosol Science, 23(6):657–665.
 OlivaAviles and Meyer (2018) OlivaAviles, C. and Meyer, M. C. (2018). bcgam: Bayesian Constrained Generalised Linear Models.
 Powell et al. (2012) Powell, H., Lee, D., and Bowman, A. (2012). Estimating constrained concentrationresponse functions between air pollution and health. Environmetrics, 23(3):228–237.
 Ramsay (1988) Ramsay, J. O. (1988). Monotone Regression Splines in Action. Statistical Science, 3(4):425–441.
 Rundel et al. (2015) Rundel, C. W., Schliep, E. M., Gelfand, A. E., and Holland, D. M. (2015). A data fusion approach for spatial analysis of speciated PM 2.5 across time. Environmetrics, 26(8):515–525.
 Spiegelhalter et al. (2002) Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B, 64(4):583–616.
 Tryner et al. (2019) Tryner, J., Quinn, C., Windom, B. C., and Volckens, J. (2019). Design and evaluation of a portable PM 2.5 monitor featuring a lowcost sensor in line with an active filter sampler. Environmental Science: Processes & Impacts, 21(8):1403–1415.
 Volckens et al. (2017) Volckens, J., Quinn, C., Leith, D., Mehaffy, J., Henry, C. S., and MillerLionberg, D. (2017). Development and evaluation of an ultrasonic personal aerosol sampler. Indoor Air, 27(2):409–416.
 Wang and Ghosh (2011) Wang, J. and Ghosh, S. (2011). Shape Restricted Nonparametric Regression Based on Multivariate Bernstein Polynomials. Technical report, North Carolina State University Department of Statistics, Raleigh, NC.
 Wang and Li (2008) Wang, X. and Li, F. (2008). Isotonic smoothing spline regression. Journal of Computational and Graphical Statistics, 17(1):21–37.
 West et al. (1994) West, M., Müller, P. M., and Escobar, M. D. (1994). Hierarchical priors and mixture models, with application in regression and density estimation. In Freeman, P. and Smith, A. F., editors, Aspects of Uncertainty: A Tribute to D.V. Lindley, pages 363–386. Wiley Series in Probability and Statistics.
 Wilson et al. (2014) Wilson, A., Rappold, A. G., Neas, L. M., and Reich, B. J. (2014). Modeling the effect of temperature on ozonerelated mortality. The Annals of Applied Statistics, 8(3):1728–1749.
Comments
There are no comments yet.