1 Introduction
Multiple systems estimation, a generalization of the classic markrecapture approach (Petersen, 1896; Schwarz and Seber, 1999), is a class of methods that are used to estimate the size of hardtoreach populations in many different contexts, including, in recent years, those comprised of human trafficking or slavery victims. The method is typically applied to wildlife populations (Williams et al., 2002) and more recently to hidden populations like those comprised of injection drug users (King et al., 2013). In the administrative or law enforcement context, multiple systems estimation is an approach to read across from lists of observed or identified individuals from the study population to gain an estimate of the total population of interest; original contributions in the subject have been made by Bales et al. (2015) and Cruyff et al. (2017). Typically, a mathematical model is posited for the pattern of incidences across the lists, and the “dark figure” of unobserved cases is estimated. A discussion of the history of the methods and a survey of a range of applications is provided, for example, by Bird and King (2018).
Multiple systems estimation provides a method to quantify the number of victims in a study population, including those that are not directly observed or detected. The method therefore plays an especially important role as it assists with policy making decisions to help combat human trafficking and slavery. For example, as set out in Bales et al. (2015), a multiple systems estimate constructed from data collated by a Government agency was a key component of the strategy (Home Office, 2014) leading to the UK Modern Slavery Act 2015.
A specific challenge posed by data on human trafficking is that it is not uncommon for there to be sparse overlap between the observed administrative lists. This is in contrast with the applications such as wildlife populations, where the researcher may have a degree of control over sample sizes and can continue capturing from the study population until sufficient overlap is observed between the capture occasions. Of course, in the original markrecapture setup where there are only two capture occasions, it is the overlap between the lists which allows the inference to be conducted at all.
However, in the human trafficking context, it appears to be the norm rather than the exception that there will be pairs of lists between which there is no observed overlap. This may be for several reasons: there may be a genuine structural reason why two particular lists cannot overlap; there may be negative correlation between lists; or it may simply be that for a relatively small sample size and two lists with small capture probabilities there do not happen to be any cases which are on both lists. In this area, there is as yet limited understanding of data and of mechanisms, and furthermore data are often highly anonymised for reasons of confidentiality and security, to the point that we may not know anything about a list other than an anonymised label. Hence, there may be no further information available as to why no cases are observed in common between two lists.
We approach inference via Poisson regression modeling applied to counts of individuals that are observed on each possible combination of the lists, with model parameters that correspond to each combination of lists. We use this approach since it is a wellknown technique that allows one to model list interactions, and since within the multiple systems estimation context inference is naturally based on categorical data for which a loglinear approach is ideal. The standard loglinear approach is set out by Cormack (1989), Rivest and Daigle (2004) and Bird and King (2018), among others, and implemented in Baillargeon and Rivest (2007) and Baillargeon and Rivest (2012). However, as Fienberg and Rinaldo (2012a)
discuss in a much more general context, contingency tables with zero entries may lead to cases where the maximum likelihood estimate of the model parameters does not exist or is not identifiable. In the context of multiple systems estimation, empty overlaps between lists require careful treatment for these reasons. The primary objective of this paper is to introduce inferential procedures and computational implementations that explicitly handle this case.
We first of all develop a method which fits a model stably to data of this kind, taking proper account of existence and identifiability issues that can arise if the data are sparse. We then consider a model selection procedure to choose the most suitable set of parameters on which to base inference. The assumptions that underlie informationtheoretic approaches do not hold, and so a stepwise approach is used, motivated and illustrated by data sets based on human trafficking victims in the New Orleans area (Bales et al., 2018) and the Western site of a research study in the USA (Farrell et al., 2019). We conduct our analyses in the R programming language (R Core Team, 2016), and have developed an accompanying R software package SparseMSE (Chan et al., 2019). The package allows readers to implement the methodology on their own data as well as to reproduce the results presented in the paper.
The paper is organized as follows. Section 2 outlines the model and gives the notation and likelihood setup, and also details specific issues concerning identifiability. Section 3 presents the modelselection routine and corresponding inference procedure. Section 4 presents the results from the two empirical applications. A comment on the R package and some concluding remarks are made in Section 5.
2 The Model
In this section, we define notation and then define and explore the model. The discussion leads to an algorithmic approach which allows the correct and stable calculations to be carried out.
2.1 Notation and Definitions
2.1.1 Capture Histories
Suppose we have capture occasions, or lists, on which members of the population can occur. An individual’s capture history can be expressed as a
, where if the individual is recorded on the th list, and 0 if not. The order of a capture history is defined to be , the number of captures that are recorded for individuals with that history. For each capture history define to be the number of individuals in our population with exactly that capture history.An alternative notation for specifying is by giving the set of indices for which the individual is recorded. When is a suffix the braces are usually omitted and the indices simply given as suffices. Thus, for example, if the capture history is that specified by the vector , and has order 2, and , usually written if , is the number of individuals which are observed on both lists 1 and 3 but not on lists 2 or 4.
A particular capture history, with order 0, is the null capture history or (all ) where the individual does not appear on any list. The quantity (or ) is the socalled dark figure of individuals which are not captured on any list. Including the null capture history there are possible capture histories, and all except the dark figure can be observed. The data we will have are the observed values , which we will also write as . We will usually collapse to and to . But for the dark figure we will never collapse to , because the letter is reserved for the total population size including both the observed cases and the dark figure.
It is characteristic of data collected in the modern slavery context that there will be some capture histories for which the observed count is zero. Typically each list only records a relatively small proportion of the total population, because of the “hidden” nature of modern slavery as a crime, and the numbers of cases recorded on any particular pattern of overlaps between lists can easily be considerably smaller.
2.1.2 Further Notation
Given two capture histories and , we say that dominates , written , if for all . That is the same as saying that the capture history represented by includes all the lists in that represented by , and possibly others. The notation indicates that one of the inequalities is strict, so that .
For any capture history define
(1) 
Thus is the number of observed cases that appear on all the lists for which , regardless of whether they do or do not appear on other lists. For example, is the total number of individuals on list 1, while is the number of individuals that are on list 1 but none of the other lists. To move to pairs of lists, is the total overlap between lists 1 and 2, while is the number of individuals that are on lists 1 and 2 but not on lists 3, 4, … . Finally note that because of the restriction to in the defining sum, the quantity does not include the dark figure but is the sum of all the observed values , the total number of individuals actually captured at some point in the study.
We will call a nonoverlapping pair of lists if , so that no individual appears on both lists. The main objective of this paper is to deal properly with data that contains such nonoverlapping pairs.
2.2 The Poisson Loglinear Model
A standard model for the analysis is the Poisson loglinear model as set out by Cormack (1989). This assumes that, independently for each ,
where
for certain parameters indexed by the possible capture histories. It should be noted that capture histories are used in two different ways, firstly to index the observed data, and secondly to index the parameters. Usually, but not invariably, the letter will be used when observations are indexed and for parameters . The index will be used in either case, as required.
The parameter indexed by the null capture history will be written more simply as . Thus, for example, the dark figure has expected value , while the expected value of is .
Altogether, there are parameters , corresponding to the capture histories including the null capture history. But the number of cases for the null capture history is not observed, and so there are only data points from which to estimate the parameters; without placing constraints on the parameters, the model is not identifiable.
As Cormack (1989) sets out, the natural approach is to set some of the to zero, and then to estimate the remainder by maximum likelihood; for example, one may set all interaction coefficients indexed by third or higherorder histories to zero, and we will do this throughout. Even if all the secondorder coefficients are included, the number of parameters to be estimated is provided . The question of model choice then reduces to deciding which of the secondorder interactions to include, and will be discussed further in Section 3 below. For any particular set of secondorder interactions, the parameter estimation can be put into a standard generalized linear model formulation.
A consequence of the definition is that it is also the case that, for each ,
Unlike the , the are not independent of one another. For example, if capture histories and share one or more lists, then the variables and will be dependent.
2.3 The Log Likelihood Function
Before considering the treatment of nonoverlapping pairs of lists, it will be helpful to derive some properties of the log likelihood function. Let be the collection of indices of parameters included in the model, and the set of parameters to be estimated. Note that will always contain .
Up to an additive constant depending only on the data, the log likelihood is given by
(2) 
Consider the first term, substituting the definition of the model, reversing the order of summation, and then substituting the definition (1), to obtain
(3)  
Turning to the other term in the log likelihood,
(4) 
say. Regarded as a function of the , each is an increasing function of each of its arguments, and hence is a decreasing function of each of its arguments . Furthermore, is a concave function because it is a sum of concave functions of linear combinations of its arguments, so is the sum of a linear and a concave function, and hence is a concave function. However, as Fienberg and Rinaldo (2012a) point out and explore in a much more general and abstract context, and as we shall see below, it is not necessarily the case that there is a unique maximum likelihood estimate of nor that such an estimate exists at all.
These expressions for the components of the log likelihood function demonstrate the following, which will be useful when we come to discuss model choice below:

The statistics are jointly sufficient for the parameters .

Given any in , is sufficient for if all the other parameters are kept fixed.
2.4 Dealing with NonOverlapping Pairs
Suppose that is a nonoverlapping pair, so that , and that is one of the parameters in the model being fitted, so that . In the terminology of Fienberg and Rinaldo (2012a) we allow an extended maximum likelihood estimate; this section gives an elementary recapitulation of some of their results cast into our specific framework. It follows from (3) that the first term of the log likelihood does not depend on , because the coefficient of is zero. So the maximum likelihood estimate of will be obtained by maximizing the function . Because, as already noted, is a decreasing function of each of its arguments, whatever the value of the other parameters the likelihood will be maximized as . The maximum likelihood estimate of may therefore be regarded as . This explains why the existing software packages yield errors or warnings if there are nonoverlapping pairs in the data and the corresponding parameters are in the model. Because the linear model is expressed in terms of the logarithm of the Poisson parameter, the value for yields legitimate values for the actual Poisson parameters, giving the value zero for for all
, provided we regard a Poisson distribution with parameter zero to be the degenerate distribution with value zero.
Substituting these zeroes for back into the expression for the log likelihood yields, writing for the vector of parameters with excluded,
(5) 
This is exactly the Poisson log likelihood based on all the observations except those for the capture histories which include both and . If there is more than one nonoverlapping pair in , the same calculations can be carried out for each pair, leading to the following algorithm:

For each in for which , record that the maximum likelihood estimator of is and remove from the list of parameters to be estimated. Let be the set of indices of parameters in that have not been removed.

For each such remove from consideration all for which , regarding them as structural zeroes. Let be the set of capture histories whose counts remain (not including the null capture history).

Use the standard generalized model approach to estimate the parameters with indices in from the observed counts of the capture histories in .
In the next section, we will see that the final step should also involve an explicit check for the existence and identifiability of the parameter estimates.
2.5 Estimability of the Population Size
There are two separate estimability issues that may arise when applying multiple systems estimation to sparse data, both of which will mean that the model will not give a welldefined finite estimate of the population size.
One possibility is that the maximum likelihood estimate does not exist, even if we take account of nonoverlapping pairs in the way set out above. Fienberg and Rinaldo (2012b)
show that existence of the estimate can be checked by solving a linear programming problem. Let
be the incidence matrix that maps the parameters in to the logarithm of the expected values of the counts of capture histories in , so that if and 0 otherwise. Let be the vector of sufficient statistics for . Then set up the linear programming problem of maximizing over all scalars and all real vectors satisfying the constraints(6)  
A necessary and sufficient condition for a maximum likelihood estimate to exist is that the maximizing value of is strictly greater than 0.
The other possibility is that although the likelihood can be maximized, the parameters are unidentifiable, in the sense that any parameter set in a particular nontrivial affine subspace of the parameter space will maximize the likelihood. In particular the parameter which estimates the dark figure will not be estimable, and hence nor will the population size. The model will be identifiable if and only if is of full column rank. It should be stressed that existence and identifiability only depend on the incidence of zeroes and nonzeroes among the various possible capture histories. Beyond being zero or nonzero, the count of any particular capture history is not relevant.
Setting for all and will yield a feasible solution to the constraints (6). Hence the maximizing value of will be at least the minimum of the observed over . In the nonsparse case, where every combination of capture histories is observed at least once, this minimum will be strictly positive and hence the maximum likelihood estimator will always exist. Also, in the nonsparse case, as long as there are at least three lists, will not be rank deficient.
Fienberg and Rinaldo (2012a) point out that most or all standard generalized linear modeling packages fail to check for existence of estimates. Nor do programs necessarily report unidenfiability directly, more often arbitrarily removing one or more of the parameters. Unless every possible capture history is actually represented in the observed data, therefore, it is important to check that a potential model gives a strictly positive value for the linear programming problem and a matrix that is of full column rank. If it fails on either count it should be ruled out. These checks incur only a small computational overhead.
A  B  C  count 

40  
30  
20  
6 
The simple example given in Table 1 shows that there is not necessarily any clear hierarchical relationship between models that fail on one or the other of the criteria. Within our framework there are eight possible choices of the pairwise effects to include in the model. The results obtained from the checks are as follows. If all three interactions are included, then the linear program yields a strictly positive value but the matrix is not of full column rank. If the model contains AB either alone or in conjunction with one of AC and BC, then the linear program result is zero, so the MLE does not exist. If only main effects are considered, or if either or both of AC and BC, but not AB are included, then the model passes both tests. So the effect of adding or removing a particular parameter to a model is not immediately predictable.
3 Inference and Model Choice
3.1 Calculating Significance
Given any model defined by parameter set , for any define
(7) 
where the are the maximum likelihood estimates of the . Further, define
(8) 
First, consider how to deal with nonoverlapping pairs within the data (note that we only consider a model with up to twofactor interactions). Suppose the model being fitted corresponds to a set of parameters indexed by , and that for some that . Should we actually include
in the model? To answer the question we test the null hypothesis that
, which is equivalent to saying that is not in the model. The natural statistic on which to base a test is , because of the results on sufficient statistics in Section 2.3.Now proceed as follows:

Fit the model leaving out the parameter , in other words using just the parameter set .

Because the probability of observing zero in a Poisson distribution with parameter is , the estimated parameter has value . This is the probability of observing in the model defined by .
In fact, this approach can be generalised to construct a value for any interaction whether or not . Again calculating from the maximum likelihood estimate over , the value is the minimum of and . Here is the lower tail probability that a random variable satisfies , while is the probability that .
3.2 Model Fitting
The modelfitting procedure is detailed stepwise, as follows:

Step 1: Set a threshold value for the value.

Step 2: Fit the model with the main effects parameters only.

Step 3: Consider in turn each interaction not already added to the model, and check that adding it to the model would not lead to a nonexistent or unidentifiable estimate.

Step 4: Among those interactions that pass the checks, find the one with the smallest pvalue.

Step 5: If that value is less than or equal to the given threshold, add the interaction to the model, and go back to Step 3. If the value is greater than the threshold, finish.
Notice that the method is akin to forward stepwise regression since parameters are added stepwise to the model. It remains to choose the threshold value, and there are two considerations to bear in mind. Firstly, because we are considering multiple candidates to add to the model, it is appropriate to apply a Bonferroni correction to any standard value we might use. So, for example, if there are 8 lists and hence 28 possible twolist interactions, even if we started with the conventional , the appropriate threshold would be 0.05/28 = 0.0018. The second consideration is that we might wish to take a relatively conservative approach aiming at parsimonious models unless the data convincingly argue otherwise. For both these reasons, we suggest using a value of but to explore the sensitivity of the result to adjusting the parameter.
One of the standard approaches in the literature is to use approaches based on information criteria for model choice. The justification of information criteria is based on the asymptotic theory of maximum likelihood estimates (Akaike, 1974) which breaks down when dealing with Poisson parameters close to zero.
A simulation example which demonstrates this in the multiple systems estimation context is presented in Figure 1. The simulation is carried out using three lists, with specified capture probabilities. Captures on the various lists are assumed to be independent. Two models are used, one with capture probabilities 0.01, 0.04 and 0.2 and the other with capture probabilities 0.3 for each list. The first model is chosen to be somewhat more typical of the sparse capture case, of the kind which often occurs in the human trafficking context, while the second is reminiscent of a more classical markrecapture study.
The probability of an individual having each possible capture history , is first evaluated. Then these probabilities are multiplied by 1000 and, for each simulation replicate, Poisson random values with expectations equal to these values are generated to give a full set of observed capture histories; together with the null capture history the expected number of counts (population size) is equal to 1000. The correct model for these data includes main effects only, so inference was carried out both for that model, and for the model with the addition of an interaction effect between the first two lists. The reduction in deviance between the two models was determined. Ten thousand simulations were carried out.
In line with the standard asymptotic theory, the QQplots show that the distribution fits the observed deviances well for the “classic” model. However the fit for the sparse model is not good. This demonstrates why information criteria cannot be relied on for fitting models in the sparse context.
4 Empirical Applications
In this section, our methods are applied to two data sets relating to victims of modern slavery and human trafficking in the USA. Both data sets display the sparseness of overlapping entries typical of data collected in this field.
4.1 The New Orleans Data
Bales et al. (2018) discuss a data set collated from a number of sources in New Orleans, given in Table 2.
A  B  C  D  E  F  G  H  count 
25  
5  
70  
33  
6  
6  
6  
21  
1  
2  
1  
1  
1  
1  
1  
2  
1  
1  
1 
Altogether there are 8 lists, and so the full incidence table including those combinations for which the observed number is zero has 255 rows. There are 28 possible pairs of lists, and of these there are 17 nonoverlapping pairs. The results are relatively insensitive to the choice of threshold value; even a threshold as large as
does not detect any significant interactions and fits a model based on main effects only, and with 28 possible interactions it is clearly inappropriate to use any larger threshold. The resulting model yields a 95% confidence interval of (644, 1618) with a point estimate of 997. Because some of the list counts are so small, the effect of combining the four smallest lists into one, to give a five list version of the data, was also investigated. If this is done, then the confidence interval is (658, 1709) with a point estimate of 1034, and so the results are essentially the same. For the five list data, none of the interactions was significant even at the 5% level.
4.2 The Western Site Data
One of two data sets considered by (Farrell et al., 2019) is collated from a number of sources in the Western site of a research study in the USA. The data are given in Table 3.
A  B  C  D  E  count 

52  
90  
114  
45  
21  
4  
2  
5  
6  
1  
3  
1  
1 
Altogether there are 5 lists, and so the full incidence table including those combinations for which the observed number is zero has 31 rows. There are 10 possible pairs of lists, and of these there are 2 nonoverlapping pairs. Applying the Bonferroni correction to the conventional 0.05 would yield a value of for the threshold value. In fact, using a larger threshold of would give the same result as that based on the Bonferroni correction; only a pairwise interaction of lists and is included to fit a model. The resulting model yields a 95% confidence interval of with a point estimate of 2484.
5 Concluding Remarks
The R software package SparseMSE (Chan et al., 2019) includes implementations of all the methodology described in this paper. In particular, it contains programs to check whether a particular model leads to either of the estimability issues set out in Section 2.5 above, and it incorporates these checks within a routine to fit any particular model, or to make the model choice using the stepwise procedure described in Section 3.2. Full details are given in the package documentation.
To conclude, in this paper we have investigated inference for multiple systems estimation using Poisson loglinear models, taking proper account of the possibility that the underlying data tables contain nonoverlapping lists, as commonly arises when the data are collected in the context of studies on modern slavery and human trafficking. We have also set out an approach to model choice and demonstrated the utility and practicality of our approach on real data sets. This area is especially challenging for methodological development because there is no “ground truth” against which methods can be assessed, and frequently there are no details of the data available beyond anonymised list data of the form presented in the tables above. Nevertheless, reliable and stable methods are important for applications in public policy, even if they are conditional on assumptions that it may not be possible to verify.
For simplicity and clarity, the procedure has been discussed and detailed in full for models which consider up to twoway interaction terms. In principle, the model checking and inference aspects can easily be extended to consider models based on higherorder interaction terms, but it seems unlikely that any data sets collected in the contexts of our current interest would merit this.
References
 Akaike (1974) Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 716–723.
 Baillargeon and Rivest (2007) Baillargeon, S. and Rivest, L.P. (2007). Rcapture: loglinear models for capturerecapture in R. Journal of Statistical Software 19, 1–31.
 Baillargeon and Rivest (2012) Baillargeon, S. and Rivest, L.P. (2012). Rcapture: Loglinear Models for CaptureRecapture Experiments. R package version 1.31.
 Bales et al. (2015) Bales, K., Hesketh, O., and Silverman, B. W. (2015). Modern slavery in the UK: How many victims? Significance 12, 16–21.
 Bales et al. (2018) Bales, K., Murphy, L., and Silverman, B. W. (2018). How many trafficked people are there in New Orleans? Lessons in measurement. Preprint, available from https://tinyurl.com/ybfb9tg6.
 Bird and King (2018) Bird, S. M. and King, R. (2018). Multiple systems estimation (or capturerecapture estimation) to inform public policy. Annual Review of Statistics and Its Application 5, 95–118.
 Chan et al. (2019) Chan, L., Silverman, B. W., and Vincent, K. (2019). SparseMSE: Multiple systems estimation for sparse capture data. R package, available from https://github.com/SparseMSE/sparsemse.
 Cormack (1989) Cormack, R. M. (1989). Loglinear models for capturerecapture. Biometrics 45, 395–413.
 Cruyff et al. (2017) Cruyff, M., van Dijk, J., and van der Heijden, P. G. M. (2017). The challenge of counting victims of human trafficking: Not on the record: A multiple systems estimation of the numbers of human trafficking victims in the Netherlands in 2010–2015 by year, age, gender, and type of exploitation. CHANCE 30, 41–49.
 Farrell et al. (2019) Farrell, A., Dank, M., Kfafian, M., Lockwood, S., Pfeffer, R., Hughes, A., and Vincent, K. (2019). Capturing human trafficking victimization through crime reporting. Technical Report 2015VFGX0105, National Institute of Justice.
 Fienberg and Rinaldo (2012a) Fienberg, S. E. and Rinaldo, A. (2012a). Maximum likelihood estimation in loglinear models. Ann. Statist. 40, 996–1023.
 Fienberg and Rinaldo (2012b) Fienberg, S. E. and Rinaldo, A. (2012b). Maximum likelihood estimation in loglinear models: supplementary material. Technical report, Carnegie Mellon Univ.
 Home Office (2014) Home Office (2014). Modern Slavery Strategy. https://www.gov.uk/government/publications/modernslaverystrategy.
 King et al. (2013) King, R., Bird, S. M., Overstall, A. M., Hay, G., and Hutchinson, S. J. (2013). Injecting drug users in Scotland, 2006: Number, demography, and opiaterelated deathrates. Addiction Research and Theory 21, 235–246.
 Petersen (1896) Petersen, C. (1896). The yearly immigration of young plaice into the Limfjord from the German Sea. Report of the Danish Biological Station 6, 5–84.
 R Core Team (2016) R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
 Rivest and Daigle (2004) Rivest, L.P. and Daigle, G. (2004). Loglinear models for the robust design in markrecapture experiments. Biometrics 60, 100–107.
 Schwarz and Seber (1999) Schwarz, C. J. and Seber, G. A. F. (1999). Estimating animal abundance: Review III. Statistical Science 14, 427–456.
 Williams et al. (2002) Williams, B. K., Nichols, J. D., and Conroy, M. (2002). The Analysis and Management of Animal Populations. Academic Press, San Diego, CA.
Comments
There are no comments yet.