In order to estimate the hyperparameters θ of the CRISP model, would like to find θ∗ that maximizes the loglikelihood log (12
). However, since this is intractable, we propose to use the Monte Carlo ExpectationMaximization (EM) algorithm
Bishop2006 . We will use EM to refine
θ in successive iterations. Let
θold be the value of
θ computed in the previous iteration. Then, in the E step of the current iteration, we will estimate the expected completedata loglikelihood

∑ZTP(ZTO,θold)⋅log(P(ZT,Oθ)). 

(27) 
We will use the blockGibbs sampling procedure described in Algorithm 1 to approximate the posterior distribution P(ZTO,θold) over the latent infection status of individuals u. If the samples drawn from the posterior P(ZTO,θold) are Z1T,…,ZmT, then in the M step, we will compute θ that maximizes the expected completedata loglikelihood

θnext 
=argmaxθm∑i=1log(P(ZiT,Oθ)) 

(28) 


=argmaxθm∑i=1T−1∑t=1∑ulog(P(ziu,t+1,OZit,θ)), 

(29) 
where ziu,t+1 is the infection state for individual u at time t+1 in sample ZiT. If ti0 denotes the number of initial S states in the sample infection trace ziu, we note that by virtue of (4) only the first ti0 terms depend on θ which reduces the above maximization term to only

m∑i=1∑uti0−1∑t=1log(f(ui,t,Zitθ))+log(1−f(ui,ti0,Zitθ)). 

We use stochastic gradient descent to compute the
θ values that maximize the above expression. We also note that for numerical stability, we reparameterize
pj via
wj as
pj=exp(wj)/(1+exp(wj)) which allows for an unconstrained optimization over
w.