1 Introduction
Malaria is a vectorborne infectious disease affecting a large population worldwide, especially in tropical and subtropical regions
(Guerra et al., 2008). It is estimated that there were 216 million cases of malaria resulting in 445,000 deaths globally in 2016 (WHO, 2017), and thus it remains a major public health problem. Effective malaria interventions have been developed, including insecticidetreated mosquito nets (ITNs) (Lengeler, 1998), indoor residual spraying (IRS) (Pluess et al., 2010), and Artemisininbased combination therapy (ACT) (Eastman and Fidock, 2009). However, these interventions are too costly to be given to everyone in need. The malaria research community has made great strides in mapping disease prevalence
Bhatt et al. (2015, 2017); Kang et al. (2018); Hay et al. (2009), modeling its transmission (Griffin et al., 2010, 2014, 2016; Bhadra et al., 2011), developing and testing effective interventions (Okell et al., 2014; Stuckey et al., 2014; Parham and Hughes, 2015; Walker et al., 2016), and implementing these interventions in practice. We proposed to build on this work to develop a realtime recommendation engine (RE) for precision interventions to help policymakers decide how to best allocate limited resources.Efforts have been made to recommend optimal interventions to different areas to reduce malaria incidence. Okell et al. (2014), Stuckey et al. (2014) and Parham and Hughes (2015) conducted costeffectiveness analysis of different combinations of interventions for malaria control and evaluated how they vary depending on malaria transmission related factors. However they only evaluated the costeffectiveness of several predesigned intervention combinations. Also, they either only recommended one most costefficient combination of interventions for the whole population in the study area or just recommend a small number of allocation strategies only tailored to environmental factors. They did not provide individualized intervention recommendations that are tailored to small areas or allow these recommendations change over time based on accumulative information. Walker et al. (2016) used a dynamic mathematical model to capture the effect of interventions under different malaria transmission settings and recommended intervention choices at national, provincial and pixel level. They only recommended what interventions should be implemented but did not specify the optimal amount of each intervention for each pixel. Also, they did not allow for recommendations to update over time. Bent et al. (2017) used agentbased exploration techniques to estimate the optimal policy over an analytical policy search space with outcomes predicted by the OpenMalaria Platform (Smith et al., 2008). They recommended three most costeffective policies in terms of combination of coverage of ITN and IRS for the specific target population, but generalization of insights to expansive environments is yet to explore and likely to be computationally prohibitive following this approach.
Our proposed recommendation engine (RE) can be formalized as a sequence of decision rules, one per stage of intervention, that map current information to a resource allocation vector specifying the resources allocated to each unit (such as a health zone). The optimal RE should optimize the expectation of some longterm cumulative outcome, e.g. the average malaria prevalence over a given time horizon, and also account for the need for interpretability and fairness. It is related to the idea of dynamic treatment regimes, which are sequential decision rules of giving treatment recommendation to individual patient at each stage based on uptodate patient information (Murphy, 2003; Robins, 2004; Chakraborty and Moodie, 2013; Schulte et al., 2014). However, there are unique challenges in the spatiotemporal resource allocation problem that distinguish it from typical DTR problems and prevent the direct application of existing DTR methods. First, methods for dynamic treatment regimes typically assume that individuals to receive treatment are independent and a large number of independent trajectories of the individuals are available in the observed data. In our allocation problem, the districts or zones to receive resources are spatially dependent so that we have to treat them as a single allocation objective over the spatial domain at each time point. As a result, there is only single observation available at each time point without independent replications. Second, most of the existing methods for dynamic treatment regimes can only solve problems with discrete treatment space including only a small number of treatment options. However, in our problem the allocation to each region is the proportion of residents to be given a bednet, and so the action space is continuous and high dimensional. Chen et al. (2016), Laber and Zhao (2015), Rich et al. (2016) dealt with continuousvalued treatments, but the samples in their methods are assumed independent.
An optimal policy for spatiotemporal resource allocation (e.g., as found using dynamic programming) is a massively complex mathematical function of many geographic and epidemiologic inputs, requiring extensive computation due to the large number of spatial locations under consideration, and is essentially a black box that generates no new knowledge. In contrast, we propose an approach that estimates the optimal RE within an interpretable class of regimes and permits straightforward computation, and is amenable to visualization, scrutiny, and stakeholder input. We build a spatiotemporal model for the progression and transmission of the disease and then draw posterior samples from the fitted system dynamic model to simulate future prevalence for any RE. The mean outcome for each regime within a prespecified class of interpretable policies is estimated and the optimal RE is maximizer over the class. The idea is related to the policysearch method for dynamic treatment regimes which models the mean outcome as a function of each regime and chooses the maximizer as the optimal regime (Robins et al., 2008; Orellana et al., 2010; Zhang et al., 2012; Zhao et al., 2012; Zhang et al., 2013; Zhao et al., 2015; Zhang et al., 2017).
However, due to the challenges mentioned above, existing methods cannot be implemented directly to the continuous resource allocation problem. Guan et al. (2019) used policysearch with gcomputation to estimate optimal personalized recommendations for recall intervals for patients with periodontal disease. They built a nonparametric Bayesian model for disease progression process, specified a class of policies on a clinically interpretable risk score, and estimated the optimal policy within this class. However, their observed data included a large number of patients with independent longitudinal profiles, and their decision space was discrete involving only two treatment options. Laber et al. (2018) developed a spatiotemporal treatment allocation strategy to slow the spread of white nose syndrome in bats. They used a statistical spatial gravity model to forecast the spread of the disease and developed an algorithm to identify a subset of locations to receive treatment at each time point aiming to optimize longrun control of the disease. While they solved a similar problem, their settings are simpler than what we consider here. Their spatial gravity model is only used to predict the locations that might be affected by the disease at future time points, whereas we need to predict the disease prevalence at each location in the future. Also, they only decided whether or not to apply treatments at each location at each time point so that the decision for each location is binary, whereas we aim to decide the amount of resources to be assigned to each location so that the action space for each location is continuous. Because of these differences, we build a more comprehensive spatiotemporal model to predict disease prevalence and define a new class of policies that can accommodate a continuous action space while remaining interpretable.
In Section 2, we formalize the optimal resource allocation problem using a decision theoretic framework. In Section 3, we propose a Bayesian spatiotemporal model to estimate the disease dynamics. In Sections 4 and 5, we construct a class of resource allocation policies and combine the spatotemporal model with a sequential optimization algorithm to estimate the optimal allocation policy subject to the cost constraint. In Section 6, we evaluate the performance of the proposed method and compare it with some naive policies using simulation experiments. In Section 7, we apply the proposed method to malaria in the Democratic Republic of the Congo (DRC). Finally, we give a brief discussion in Section 8.
2 Problem statement
Assume the spatial domain is divided into health zones. For health zone , the environmental covariates are observed at baseline and remain constant over time. At each time point , in health zone , the malaria disease rate, , is estimated by the sample proportion of individuals with malaria in the malaria field survey conducted at time . Following Bhatt et al. (2015), we use as the response. For settings with many health zone disease rates near zero, a binomial, Poisson, or negative binomial model might be preferable. The intervention resource allocated to health zone at time point is denoted . In our analysis, is the bednet coverage, i.e. the sample proportion of individuals sleeping under a bednet. The data available after time are , where , for , and represent the collection of covariates, malaria disease rate and resource allocation respectively for all health zones.
We define to be a summary of the information collected at time , and a resource allocation policy, , is a function from to the action space . Let be the class of policies under consideration which we assume are parameterized by . Hence, the policy is determined by the parameter vector which maps the currently available information to a resource allocation recommendation . Define to be the set of possible resource allocation actions with realized history information so that .
We formalize the optimal resource allocation policy using potential outcomes. Define to be the potential disease rate at time under the sequence of allocations up to time . Define to be the potential outcome at time if the resources were allocated under the resource allocation policy up to time
. The loss value associated with a policy is defined as the expectation of a loss function of the potential outcomes under the policy, such as the mean malaria prevalence over the next five years (assuming resources are allocated once per year),
. The optimal policy within the prespecified class is then defined as . In order to estimate the optimal policy using the observed data, we make the following assumptions (Robins, 2004; Schulte et al., 2014): (1) sequential ignorability, for , where is the set of all possible resource allocation actions up to time ; (2) consistency, , where is the sequence of observed resource allocation actions up to visit ; (3) positivity, let denote the conditional treatment density given realized history information , and then there exists so that for all and . With these assumptions, the loss value function to be optimized can be expressed using gcomputation and the datagenerating model, which is the malaria transmission model given in Section 3.3 Bayesian spatiotemporal model
We extend the hierarchical Bayesian spatiotemporal model proposed in Mugglin et al. (2002) to model the spread of the disease. We assume that
(1) 
where is measurement error and is a spatiotemporal process. Let be the number of neighbors of zone and be the index set of its neighbors. We assume at each time and each health zone that
(2) 
so that the progression of the disease rate depends on its previous disease rate, its neighbors’ previous disease rate, environmental covariates and the intervention level. The coefficients , , , , , can be interpreted as the intercept, the main effect of resource allocation, the main effect of the previous disease status, the interaction effect of resource allocation and the previous disease status, the main effect of previous neighborhood disease status, the interaction effect of resource allocation and previous neighborhood disease status, respectively. The innovation process is modeled as a Gaussian Markov random field (GMRF) that is independent in time and spatially correlated. Specifically,
(3) 
where is diagonal with diagonal elements ; is the adjacency matrix with , if zone is a neighbor of zone , and otherwise;
is the variance parameter, and
controls spatial dependence. An alternative to this model is the intrinsic model with which would provide some computational savings at the expense of model flexibilityEquivalently, the spatiotemporal process model can be written in matrix form as the Gaussian firstorder autoregressive process:
Spatiotemporal dependence is determined by the propagator matrix that depends on and is parameterized by , , and such that its element is
The term varies depending on the resource allocation and environmental covariates X to control the latent process to switch among growth, recession, or stable phases. The latent process is able to incorporate the interaction between the spatial and temporal correlation through the propagator matrix . Different levels of interventions will affect the environmental effects on the disease rate and also how the disease spreads in space and time.
Here we assume a firstorder autoregressive model in time. For the motivating malaria analysis this should be sufficient because the time from exposure to manifestation is much less than the yearly time step of the data. In other settings, higherorder autoregressive models can be fit to account for incubation periods or other delayed effects.
4 Recommendation engine
We use policysearch to estimate an optimal resource allocation policy within a class of policies that are interpretable to domain experts. We use a global utility function to parameterize the class of policies under consideration. The global utility function summarizes the current available information via the parameter and is a function of the allocation . We consider the class of policies of the form subject to some cost constraints, where is a class of global utility functions s.t. measures the “goodness” of allocation in state under parameter . The class is chosen to capture salient features of the decision problem, e.g., cost, fairness, spread dynamics, and logistics. Thus, the resultant class of policies is interpretable and the estimated optimal policy, say indexed by , can be plugged into this utility function to identify features driving optimal malaria control. In practice, the total available resources are limited by annual logistical or budgetary constraints. For example, constraints are placed on the bednet distribution in each region, , and on the total number of bednets distributed so that , where is the population in health zone at time .
To construct the global utility function for each health zone , we first define a priority score that represents its priority level of receiving resources so that the zone with higher priority score should be allocated more resources. The priority score is constructed based on all available data before resource allocation at time and depends on the policy parameter , i.e., . Then we define a local utility as a function of that calculates the local utility for health zone with priority can gain if resources are allocated to the zone. The utility function should be monotonically increasing with the resource allocation since additional resources will bring additional benefits. The global utility function is constructed using the total local utility aggregated over the whole spatial domain.
4.1 Priority score
The priority score for each zone and time is defined to depend on userspecified risk factors constructed from currently available data . There is great flexibility in constructing the risk factors. They can include the standardized environmental covariates , such as temperature and precipitation, or the zone’s current disease status , the gradient of disease statue , or even nonlinear summaries of the characteristics of the zone. The priority score is then defined as , where () determine the weights given to each risk factor in the priority score.
4.2 Local utility function
Intuitively, a reasonable local utility function that depends on both the resource allocation and the priority score should satisfy the following assumptions:

for all , i.e. the utility is 0 if no resource is assigned.

is increasing in for all , i.e., more resources will bring more benefits.

is increasing in for , i.e., the same level of resources assigned to the health zone with higher priority score will bring more benefits. This property assures the health zone with higher priority score tend to receive more resources in order to maximize the overall utility.
A natural and simple utility function satisfying the above properties is the linear utility function . This utility function assumes that the individual utility increases linearly with resource allocation, i.e., an additional unit of resource will bring constant marginal utility that is set to be the priority of the individual zone.
Equity or fairness is an important factor to consider when deciding the resource allocation in healthcare (Daniels et al., 2000; Gibson et al., 2004; Guindo et al., 2012). Fairness concerns dictate that everyone should have the same right to the resources and the priority should be given to the zones with more severe situation (Nord et al., 1999; Nord, 2015). Similar to the idea proposed in Nord et al. (1999), a convex utility function can accommodate some fairness concerns. One additional unit of resources will bring more utility if the zone has been assigned less resources. In other words, the marginal utility value should decrease with the resources allocation. So the utility function that also satisfy the following additional property can account for fairness:

for and all , i.e., the marginal utility value is decreasing when allocated resource is increasing.
An example of the utility function that satisfies all above properties is the quadratic utility function .
Figure 1 illustrates the two utility functions with different values of priority scores. Both utility functions increase with , but the gradient of the linear function in terms of is constant while the gradient of the quadratic function is decreasing.
4.3 Global utility function
The global utility is defined as the summation of all local utilities but also with a penalty on the differences of resource allocations among neighborhood health zones:
where , , and indicates that zone and zone are neighbors. The penalty term with a positive weight can smooth the resource allocation and account for fairness. A large penalty encourages spatial clusters of zone to be given intense treatments which could be more effective than scattered sites with intensive treatments. As the weight of the penalty term is to be optimized, this additional term allows for a more flexible class of policies to be considered.
The recommendation engine suggests allocations that maximize the global utility subject to the resource constraints on the total number of bednets distributed to the whole region:
(4)  
Other nuances in the resource allocation decision making can be easily taken into consideration by adding more constraints to (4). For example, if it is agreed that allocating bednets to health zones with malaria prevalence is illadvised, we can simply add a constraint that if .
5 Policysearch
The optimal priority score weights minimizes an optimality criterion , such as the estimated expected cumulative (over space and time) malaria prevalence over the next five years. Therefore, although the policy only gives the resource allocation of one time point, the policy optimizes longterm outcomes. Given the posterior samples of the parameters in the Bayesian spatiotemporal model, the future malaria prevalence can be simulated to construct an estimator of the expected cumulative prevalence over space and time . The plugin estimator of the optimal weights is . For numerical stability, we replace with . Then is defined as the minimizer of .
A Krigingbased optimization method (Picheny et al., 2013) is used to estimate which minimizes . At the first step, we evaluate the value for the initial points of
from a Latin hypercube design generated using “optimumLHS” function in the R package “LHS.” We do a linear transformation of the initial design by setting the range of
and . The value corresponding to each pointin the design is estimated using Monte Carlo simulation given the posterior samples of the parameters. By using a different posterior sample of the model parameters for each simulated trajectory in the Monte Carlo simulation, trajectories are samples from the full posterior predictive distribution of future prevalence and thus our policy accounts for both parametric and aleatoric uncertainty. Using the initial training data
, a Gaussian process regression model is fit to predict the value corresponding to a new using kriging. The algorithm sequentially selects next weight to visit that optimizes the expected improvement (EI) defined in Jones et al. (1998) and updates the Gaussian process model parameters at each iteration. We use the the R package “DiceOptim” (Roustant et al., 2012) to implement the optimization procedure.At each of the future years, if new data is available, the Bayesian spatiotemporal model can be refitted and the posterior distribution of the parameters can be updated. The resource allocation decision for the future years can be made by reoptimizing the priority score weights based on the updated posterior samples.
Each posterior sample of the parameters and predicted malaria prevalence in the MCMC procedure corresponds to a different optimal . We can quantify the uncertainty of by applying the optimization procedure for each posterior draw to get the posterior distribution for .
6 Simulation
6.1 Generative model
For the simulation, we assume that there are health zones that are arranged as a square grid with grid spacing 1 between adjacent sites, and include only one environmental covariate simulated from the Gaussian process with mean zero and variance one, and correlation , where is the distance between the centroids of zone and zone . We simulate the baseline latent process for the health zones from a Gaussian process such that and the corresponding logit transformation of the disease rates are simulated from for . We simulate the disease spread for years following the generative model such that for , ,
where , and resource allocation is assumed increasing over time (to mimic the real malaria data) such that are simulated from and are truncated at and . The disease rates for the five years are simulated from for and . We simulate datasets using this generative model.
6.2 Policy estimation
We consider three risk factors in the policy: the environmental covariate , logit of disease rate at previous time point , mean logit of disease rates of neighborhood zones at previous time point . The priority score of zone at time is then . We assume that the number of individuals in different health zones are the same and give the constraint that . We consider three scenarios with different values of resource constraint level and .
We consider the following resource allocation policies for comparison:

Linear utility (Linear): our proposed policy with linear local utility function and a spatial penalty term to smooth the resource allocation.

Quadratic utility (Quad): our proposed policy with quadratic local utility function and a spatial penalty term to smooth the resource allocation.

Highest rate (Highest_rate): assign a bednet to each individual in the zones with highest disease rates and no bednets to the remaining zones.

Even: assign the same percentage of bednets to each healthzone.
For each simulated dataset, we use all information simulated up to year as the training data to fit our proposed Bayesian spatiotemporal model using MCMC sampling with iterations. For the first two policies, the optimality criterion for each is estimated using Monte Carlo simulations given the posterior samples. Sequential optimization is used to estimate the optimal policy that minimizes the expected mean malaria prevalence in the future five years within the prespecified class. The loss value associated with estimated policy are approximated by Monte Carlo simulations given the true generative model. For the last two policies, there is no need to fit the model and no parameters to estimate. We just use Monte Carlo simulations given the true generative model to approximate the expected mean malaria prevalence in the future five years under the policy. The approximated loss values associated with the four policies for th simulated dataset are denote as respectively.
We use the two naive policies “Highest_rate” and “Even” as two baseline policies and show the improvement of our proposed policies in terms of loss value compared with the baseline policy. For each of the simulated dataset, we compute the improvement as , and , . Figure 2(a) plots the sampling distribution of the improvement of our proposed policies with different utility functions. Under this simulation setting, “Highest_rate” policy is preferred over “Even” policy. But we can see our proposed policies have significant improvement compared with either of the naive policies, especially when there are moderate level of total resources (). The policy with the linear utility function works slightly better than the policy with quadratic utility function when the total resource level is low (). This indicates more extreme resource allocation might improve the overall benefits under this specific simulation setting. We conduct another simulation assuming the disease transmission model is misspecified to check the robustness of our method. We assume the true model is more complicated than our proposed model by adding a quadratic effect of the resource allocation on the disease progression, i.e.
We simulate the disease spread for years with the latent process
and all other settings are the same as the simulation above. We can see from Figure 2(b) that although our proposed model simplified the true disease spread model, our proposed policies still perform significantly better than naive policies for most cases. Under this simulation setting, the policy with the quadratic utility function works much better than the policy with the linear utility function as this simulation setting prefers more even resource allocation.
From the simulation results, we can see under different simulation settings and even when our proposed disease progression model is misspecified, our proposed policies are significantly better than naive policies and also consider fairness by allocating the resources more smoothly.
7 Application to the DRC data
We illustrate our method using data in the Democratic Republic of the Congo (DRC) primarily based on Demographic Health Surveys (DHS). DHS are crosssectional, populationbased cluster household surveys. In each survey, clusters are randomly chosen to representative of the national population. Within each cluster, households are randomly selected to participate in the survey. Two DHS program surveys  one in 2007 and another in 2014  were conducted to study malaria prevalence and treatment allocations in DRC. In the survey, structured questionnaires are administered to selected households to collect malariarelated information, such as their treatment status including bednet use. Also, dried blood spots were collected to test the malaria status.
Bhatt et al. (2015) made use of the data from the DHS program surveys and five additional nonDHS program surveys and built a Bayesian hierarchical model to construct a malaria endemicity map across African from 2000 to 2015 in terms of Plasmodium falciparum parasite rate (PfPR). Interventions coverage levels from 2000 to 2015, including insecticide treated nets (ITN), indoor residual spraying (IRS) and Artemisinin based combination therapy (ACT), are also estimated. We download the surface data of PfPR and ITN for DRC from https://map.ox.ac.uk/countryprofiles/#!/COD. In the surface data, PfPR and ITN rate are estimated at a 5km by 5km resolution. Using the smoothed surfaces as data may bias our parameter estimates, but they greatly expand the spatiotemporal coverage of our data which is needed to build a disease progression model.
As malaria intervention resources are allocated in health zone level in DRC, we map the surface data of PfPR () and ITN coverage rate () to each of the 515 health zones in DRC by taking the average values of the small cells lying in each health zone as the value of PfPR or ITN coverage rate in the corresponding health zone. We make the assumption that the populations at each 5km by 5km cell within a health zone are the same so that the mean rate of the cells can be used as the rate of the health zone. As there are almost no ITN usage before 2007, we only use the mapped data from 2007 to 2015 to train the Bayesian spatiotemporal model. In the model, we include two environmental covariates ( and ): annual average temperature and annual average precipitation. As the annual average temperature and annual average precipitation are relatively stable in a certain area, we assume the annual average temperature and annual average precipitation are constant over time in each health zone. We download the monthly worldwide average temperature and precipitation data for 19702000 from http://worldclim.org/version2 and map them to health zone level. We use the standardized mean annual average temperature and precipitation in 19702000 as the constant environment covariate values used in the model training and prediction.
The collected and processed data are fitted to the model in (1), (2) and (3). We use 5,000 iterations in Gibbs sampling and discard first burnin 2,000 samples to obtain 3,000 posterior samples. Table 1
summarizes the posterior mean and 95% credible interval of the parameters in the model. Temperature, previous disease status and previous neighborhood disease status all have significant effects on the disease spread and the intervention ITN can significantly decrease the disease rate. Environmental covariates and previous disease status also have interaction effects with ITN on the disease progression.
Response  Mean  95% CI  Mean  95% CI  

Intercept ()  0.131  (0.192, 0.070)  0.011  (0.010, 0.012)  
ITN ()  0.302  (0.366, 0.234)  0.142  (0.139, 0.146)  
Temperature ()  0.033  (0.020, 0.045)  0.999  (0.998, 0.999)  
Precipitation ()  0.003  (0.012, 0.018)  
Temperature*ITN ()  0.097  (0.125, 0.069)  
Precipitation*ITN()  0.053  (0.088, 0.019)  
Previous ()  0.917  (0.902, 0.932)  
Previous*ITN ()  0.096  (0.062, 0.129)  
Prev_neighbor ()  0.040  (0.016, 0.065)  
Prev_neighbor*ITN ()  0.092  (0.145, 0.039) 
We consider four risk factors in the priority score in the policy: the standardized annual average temperature () and the standardized annual average precipitation (,), logit of disease rate at previous time point and mean logit of disease rate of neighborhood zones at previous time point . We evaluate the policies with either the linear local utility function or the quadratic local utility function and use the resource constraint level .
We randomly draw 100 posterior samples of the parameters and estimate the optimal policy in terms of corresponding to each posterior draw to get the posterior distribution of . Unlike typical MCMC sampling which suffers from autocorrelation, the samples of should be independent draws from the posterior and so fewer samples are needed for these parameters than are needed in typical MCMC sampling. Figure 3 plots the posterior distribution of when using either the linear utility function or the quadratic utility function. The posterior mean weights for all risk factors are positive, which suggests the priority of being allocated resources for each health zone tends to be positively correlated with temperature, precipitation, current disease status and current neighborhood disease status. For both utility functions, temperature and current disease rate of the health zone seem to be most important factors in determining the risk score and priority. The posterior distribution of weight for the spatial penalty term concentrates more towards to zero when using the quadratic utility function compared with using the linear utility function. This suggests that the spatial penalty term does help to better allocate the resources in terms of smoothness and efficiency when the linear utility function is used, but does little to help when the quadratic utility function is used as the quadratic utility function is able to smooth the resource allocation inherently.
We also estimate one optimal policy averaging over the uncertainty of the parameters. The estimated optimal policy with the linear local utility function is with the priority score
and the weight for the spatial penalty . The estimated optimal policy with the quadratic local utility function is with the priority score
and the weight for the spatial penalty
. The estimated optimal weights suggest that health zones with higher temperature, more precipitation, higher disease rate in the zone and neighborhood zones tend to have higher priority to be allocated more resources. The loss value corresponding to the two optimal polices are 0.135 and 0.136 respectively and the loss values corresponding to “Highest_rate” policy and “Even” policy are 0.140 and 0.149, all with standard error
. The proposed policies improve the value by about and compared to the two naive policies. This is a substantial improvement when considering the number of disease cases that can be eliminated.Figure 4 illustrates the resource allocation next year using the two estimated optimal policies or using “Highest_rate” rate policy. The estimated optimal policy using the quadratic utility function allocates the resources most smoothly while the “Highest_rate” policy only give extreme resource allocation ( or ).
We refit the model using data in years 20072012, 20072013 and 20072014 as training data and estimate the risk factor weights with linear utility function for resource allocation recommendation in year 2013, 2014 and 2015 respectively. The posterior distribution of optimal weights are given in Figure 5. They show that the updated estimated optimal policy is stable across years suggesting that a dynamic model would not lead to dramatic improvements for this analysis. Of course, for cases where treatment is applied annually there is ample time to update the policy, and so updating the policy using all available data would be advisable.
8 Discussion
We develop a recommender system for spatiotemporal resource allocation to maximize the efficacy of malaria control efforts. Our proposed statistical framework deals with the challenges of spatial dependence and continuous action space. We used a hierarchical Bayesian spatiotemporal model to approximate the system dynamics of the disease transmission involving the effect of environmental covariates and allocated resources, and construct a flexible and interpretable class of allocation policies that is also computationally feasible for searching the optimal resource allocation policy with the continuous action space. The simulation experiments suggest the proposed method performs well, and it is shown to be able to improve the resource allocation efficacy compared with naive polices in both simulation studies and the application to DRC data.
There are some limitations in our studies which provide possible directions for the future work. We make the simplifying assumption that the decision maker will completely comply with the recommended allocation when evaluating the future potential outcomes for each regime. If partial compliance is possible, a compliance model for the distribution of the decision maker’s actual action conditioning on the recommended action can be added to the current framework. We assume that a stream of malaria prevalence data for each health zone and each year is available, however, realisticly only partial data may be available due to the difficulty of continuous survellience. To deal with this problem, our proposed Bayesian model could easily inpute the missing data conditional on all the observed data. Our proposed recommendation engine relies on the postulated spatiotemporal model for the malaria transmission. A more flexible semi or nonparametric model can be constructed to improve the robustness of the method to model misspecification. In the current framework, we only consider one intervention (ITN) at a time when optimizing the resource allocation. Our method can be extended to consider several interventions simultaneously and recommend the optimum allocation policy for all the related resources. As malaria intervention resources are allocated at health zone level in DRC, we build the spatiotemporal model based on the use of data aggregated at the district level and define recommendation engine to decide how to allocate resources to a finite number of regions. If applying treatment to points rather than a finite number of regions is more of interest and data are available at point locations, then a geostatistical model would be preferable to avoid bias in estimating covariate effects and a completely new framework is required to define the point level treatment allocation policy, and it is a topic for future work.
When we apply our method to the DRC data, the data we use to fit the model is the generated PfPR and ITN surface data estimated in Bhatt et al. (2015) using the Bayesian hierarchical model based on very sparse survey data instead of the real yearly collected health zone level data. Also, we make the assumption that the populations at each 5km by 5km cell within a health zone are the same so that the mean rate of the cells can be used as the rate of the health zone. As a result, we only use the data as the illustration purpose instead of accurately reflecting the situation in DRC.
References
 Novel exploration techniques (NETs) for malaria policy interventions. arXiv preprint arXiv:1712.00428. Cited by: §1.
 Malaria in Northwest India: Data analysis via partially observed stochastic differential equation models driven by Lévy noise. Journal of the American Statistical Association 106 (494), pp. 440–451. Cited by: §1.
 Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization. Journal of The Royal Society Interface 14 (134), pp. 20170520. Cited by: §1.
 The effect of malaria control on Plasmodium falciparum in Africa between 2000 and 2015. Nature 526 (7572), pp. 207–211. Cited by: §1, §2, §7, §8.
 Statistical methods for dynamic treatment regimes. Springer. Cited by: §1.
 Personalized dose finding using outcome weighted learning. Journal of the American Statistical Association 111 (516), pp. 1509–1521. Cited by: §1.
 Benchmarks of fairness for health care reform: a policy tool for developing countries. Bulletin of the World Health Organization 78, pp. 740–750. Cited by: §4.2.
 Artemisininbased combination therapies: a vital tool in efforts to eliminate malaria. Nature Reviews Microbiology 7 (12), pp. 864. Cited by: §1.
 Setting priorities in health care organizations: criteria, processes, and parameters of success. BMC Health Services Research 4 (1), pp. 25. Cited by: §4.2.
 Potential for reduction of burden and local elimination of malaria by reducing Plasmodium falciparum malaria transmission: a mathematical modelling study. The Lancet Infectious Diseases 16 (4), pp. 465–472. Cited by: §1.
 Estimates of the changing ageburden of Plasmodium falciparum malaria disease in subSaharan Africa. Nature communications 5, pp. 3136. Cited by: §1.
 Reducing Plasmodium falciparum malaria transmission in Africa: a modelbased evaluation of intervention strategies. PLoS medicine 7 (8), pp. e1000324. Cited by: §1.
 Bayesian nonparametric policy search with application to periodontal recall intervals. Journal of the American Statistical Association, pp. 1–13. Cited by: §1.
 The limits and intensity of Plasmodium falciparum transmission: implications for malaria control and elimination worldwide. PLoS medicine 5 (2), pp. e38. Cited by: §1.
 From efficacy to equity: literature review of decision criteria for resource allocation and healthcare decisionmaking. Cost Effectiveness and Resource Allocation 10 (1), pp. 9. Cited by: §4.2.
 A world malaria map: plasmodium falciparum endemicity in 2007. PLoS medicine 6 (3), pp. e1000048. Cited by: §1.
 Efficient global optimization of expensive blackbox functions. Journal of Global optimization 13 (4), pp. 455–492. Cited by: §5.
 Spatiotemporal mapping of Madagascar’s Malaria Indicator Survey results to assess Plasmodium falciparum endemicity trends between 2011 and 2016. BMC medicine 16 (1), pp. 71. Cited by: §1.
 Treebased methods for optimal treatment allocation. Biometrika 102 (3), pp. 501–514. Cited by: §1.
 Optimal treatment allocations in space and time for online control of an emerging infectious disease. Journal of the Royal Statistical Society: Series C (Applied Statistics) 67 (4), pp. 743–789. Cited by: §1.
 Insecticide treated bednets and curtains for malaria control. Cochrane database of systematic reviews (2). Cited by: §1.
 Hierarchical statistical modelling of influenza epidemic dynamics in space and time. Statistics in Medicine 21 (18), pp. 2703–2721. Cited by: §3.
 Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65 (2), pp. 331–355. Cited by: §1.
 Incorporating societal concerns for fairness in numerical valuations of health programmes. Health economics 8 (1), pp. 25–39. Cited by: §4.2.
 Costvalue analysis of health interventions: introduction and update on methods and preference data. Pharmacoeconomics 33 (2), pp. 89–95. Cited by: §4.2.
 Contrasting benefits of different artemisinin combination therapies as firstline malaria treatments using modelbased costeffectiveness analysis. Nature communications 5, pp. 5606. Cited by: §1, §1.
 Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part I: main content. The international journal of biostatistics 6 (2). Cited by: §1.
 Climate influences on the costeffectiveness of vectorbased interventions against malaria in elimination scenarios. Phil. Trans. R. Soc. B 370 (1665), pp. 20130557. Cited by: §1, §1.
 Quantilebased optimization of noisy computer experiments with tunable precision. Technometrics 55 (1), pp. 2–13. Cited by: §5.
 Indoor residual spraying for preventing malaria. Cochrane Database Syst Rev 4 (4). Cited by: §1.
 Optimal individualized dosing strategies: a pharmacologic approach to developing dynamic treatment regimens for continuousvalued treatments. Biometrical Journal 58 (3), pp. 502–517. Cited by: §1.
 Optimal structural nested models for optimal sequential decisions. In Proceedings of the second seattle Symposium in Biostatistics, pp. 189–326. Cited by: §1, §2.
 Estimation and extrapolation of optimal treatment and testing strategies. Statistics in medicine 27 (23), pp. 4678–4721. Cited by: §1.
 DiceKriging, DiceOptim: Two R packages for the analysis of computer experiments by Krigingbased metamodeling and optimization. Cited by: §5.
 Qand Alearning methods for estimating optimal dynamic treatment regimes. Statistical science: a review journal of the Institute of Mathematical Statistics 29 (4), pp. 640. Cited by: §1, §2.
 Towards a comprehensive simulation model of malaria epidemiology and control. Parasitology 135 (13), pp. 1507–1516. Cited by: §1.
 Modeling the cost effectiveness of malaria control interventions in the highlands of western Kenya. PloS one 9 (10), pp. e107700. Cited by: §1, §1.
 Estimating the most efficient allocation of interventions to achieve reductions in Plasmodium falciparum malaria burden and transmission in Africa: a modelling study. The Lancet Global Health 4 (7), pp. e474–e484. Cited by: §1, §1.
 World Malaria Report 2017. Technical report World Health Organization. Cited by: §1.
 A robust method for estimating optimal treatment regimes. Biometrics 68 (4), pp. 1010–1018. Cited by: §1.
 Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika 100 (3), pp. 681–694. Cited by: §1.
 Estimation of optimal treatment regimes using lists. Journal of the American Statistical Association (justaccepted). Cited by: §1.
 New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association 110 (510), pp. 583–598. Cited by: §1.
 Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association 107 (499), pp. 1106–1118. Cited by: §1.
Comments
There are no comments yet.