 # On a statistical approach to mate choices in reproduction

We provide a probabilistic approach to modeling the movements of subjects through multiple stages, with "stays" or survival at each stage for a random length of time, and ending at a desired final stage. We use conditional Markov chains with exponential survival times to model the movement of each subject. This is motivated by a study to learn about of the choices that different types of female turkeys make in choosing a male turkey, and in particular, the differences in male choices between groups of females. In this paper, we propose a model for the subjects' movements toward the final stage, and provide maximum likelihood estimation of the model parameters. We also provide results relating to certain questions of interest, such as the distribution of the number of subjects reaching a stage and the probability that a subject reaches the final stage, and develop methods for estimating these quantities and testing statistical hypotheses of interest.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Motivation and Problem Setup

This work is motivated by a study of female turkeys’ preferences in mate choice, with the goal of learning about the differences in male choices between groups of females turkeys. In the literature, researchers have mostly considered modeling a function for preferences. Some references to modeling preferences can be found in Kirkpatrick and al (2006), who have proposed a study for mate choice in Tungara frogs. They showed that most of the assumptions are not backed by statistical evidence and suggested that a deeper analysis of the most basic properties of choice rules is necessary. Dechaume and al. (2013) have shown that female mate choice in convict cichlids is transitive and consistent with a self-referent directional preference. For a better understanding of the topic, one can read Heisler and al. (1987), Navarick and Fantino (1972), Phelps and al. (2006), Bradley and Terry (1952).

In this paper, we take a probabilistic approach and propose a model for data collected in a study on male choice of female turkeys. The study involves a design that monitors movement of female birds through several stages and the time spent at each stage, until they reach a male bird (the final stage). The goal is to represent the pattern of behavior of female birds by a parameter that can allow a better representation of the differences between female groups. Assume we have stages (these are the stages each female bird may pass through in a transitional horizontal move). There is a finite number of states for each Stage () denoted denoted (these could be transitional vertical moves of the females through the grid). The individual process moves from a state in Stage to any state in Stage with a random waiting time at Stage . The process terminates when a bird reaches the stage or after a time at the same position, when the individual is removed from the study.

The time is the maximal time the process can spend in one given state without move. Moves are checked every units of time. The underlying assumption is that the process of moves between stages is a conditional Markov chain (conditional on time). This means that for a given set of time stamps, the process of moves works like a Markov chain ignoring the past moves. We will also assume that the distribution of the time conditioned on the set of moves is exponential.

### 1.2 Structure of the paper and results

In Section 2 we provide the likelihood function under the assumptions provided above. The derivations are technical due to the lack of information about the exact time of the moves.

Section 3 presents maximum likelihood estimators (MLE) of various quantities of interest and their distributions. For instance, the probabilities to reach the final stage or a specific state of the final stage is provided. The number of records at a given stage, i.e., the number of times a bird is seen at a given stage is of interest. We derive its probability distribution, and it’s mean and variance. A Central Limit Theorem is also provided for the number of entries that reach the final stage.

A Central Limit Theorem for the maximum likelihood estimator(MLE) of the vector of the probabilities to reach the different states of the final stage is also derived. This provides a tool for comparison of the different groups of birds. An alternative estimator of these probabilities based on the multinomial distribution of moves from stage

is provided along with the corresponding Central Limit Theorem in formula (38). For this last case, a limit theorem is proposed for testing inequalities for classification of the states of the final stage within each group. At the end of the section, we provide an alternative way to compare groups through their most probable paths. Here, we propose to use the MLE to estimate the probabilities and compare them.

## 2 Distribution and likelihood functions

A single observation from this process consists of a set of positions (sates within stages) and times spent at each state. Additionally, we assume that each of the exponential distributions representing the probability distribution of time spent at a state within a stage depends only on the states and and not on? the stages. This means, for example, that the conditional probability distribution function of the time till move to state

of stage from any state of stage for a path from group is . We are assuming that the average waiting time does not depend on the state the path is moving to and the stage it is moving from, because the impact of the state is incorporated into the transition probabilities. We also assume that the sum of the transition probabilities of the moves to the next stage is equal to . Our goal is to study the differences between the groups based on their patterns of moves.

We assume that there are groups of birds. Let represent the vector of states visited by the path from group , where and , where is the number of groups and is the number of paths from group . We use be the matrix (of size ) of the numbers of moves from stage for observations from group , and for the probability of move from state u of stage to state of stage for . Let - be the matrix of probabilities of moves from Stage for paths from group . We will denote

 P(n)iki,k=si∏j=1si−1∏u=1pnku,i,ju,i,j,k,

which is the contribution to the likelihood of the moves

When a path from group starts, the process is checked every units of time and the moves are recorded until it terminates. It is reasonable to assume that these moves are interval censored, given the setup that we have. We don’t know the exact times of the moves. We only know that exactly one move happens in an interval of length . Therefore, the contribution to the likelihood of a move from Stage to state of Stage within our assumptions is found by computing the probability that there is a move from within time of the last record and there is no move from Stage thereafter before time units.

If we let be the time till move from stage and be the time till move from stage , then the contribution is

 P(Z1≤b,Z2>b−Z1)=∫b0∫∞b−uλkiλki+1e−λkiue−λki+1tdtdu=e−bλki+1∫b0λkie−(λki−λki−1)udu=λki(e−bλki+1−e−bλki)λki−λki+1. (1)

Notice that if the consecutive stages have the same rates parameter for , then the above formula by considering the derivative gives the following:

 P(Z1≤b,Z2>b−Z1)=λke−bλk. (2)

The above formula represents the contribution to the likelihood function of the observed interval for which a given path was seen in stage for the first time. Every stage that was visited by this path has such a contribution except for Stage 0 (nothing comes into stage 0) and Stage (nothing goes out of Stage ). The stages in which the path was stopped for staying too long also have this contribution (to the likelihood), and the contribution of censoring that we will provide later (nothing goes out). Considering all paths of group and the entire sample, it is clear that arriving to stage and moving after the being recorded gives the likelihood term

 Mi(λ)=K∏k=1(λki(e−bλki+1−e−bλki)λki−λki+1)vki, (3)

where is the number of paths from group that visit state . When stages have equal mean waiting time , this becomes

 Mei(λ)=e−b∑Kk=1λkvkiK∏k=1(λk)vki. (4)

Therefore, the contribution of all moves to the conditional likelihood for the data set is

 M(λ)=m−2∏i=1(K∏k=1(λki(e−bλki+1−e−bλki)λki−λki+1)vki), (5)

or for equal average waiting time

 Me(λ)=e−b∑Kk=1λk∑m−2i=1vkiK∏k=1(λk)∑m−2i=1vki. (6)

Now, we consider the contribution of stays at the stages, excluding the time intervals in which the the moves took place. For the state of stage and for every path from group , we define the stay time as

 Ti,j,k,ℓ=wki,j,ℓb,

where is the number of time intervals the given path was seen at the position after the first record there. We will refer to this by the phrase that the path survives the move for the time , which is the time the bird has been seen in that state. Let represent the number of intervals all the paths from group stayed in State of Stage , and

 wki=si∑j=1wki,j, (7)

be the number of intervals all paths from group that stayed in stage . Thus, the conditional survival function is

 S(Ti,j,k,ℓ)=1−F(Ti,j,k,ℓ)=e−λki+1Ti,j,k,ℓ=e−bλki+1wki,j,ℓ. (8)

Given that for the path from group be seen at State of stage for the time , the contribution of this portion of the path to the conditional likelihood function is the survival function given by formula (8). Considering all the paths in the groups the contribution of the stays to the likelihood is:

 S(λ)=m−2∏i=0si∏j=1K∏k=1e−bλki+1wki,j=K∏k=1m−2∏i=0e−bλki+1wki, (9)

or with equal average waiting times , it is

 Se(λ)=e−b∑Kk=1λk∑m−2i=0wki, (10)

Now, we consider the contribution to the likelihood for the paths that reach the end of the study, i.e., Stage . This move is different from all others because no move is expected henceforth. So its contribution will simply be the probability of moving before units of time, as given below:

 Lm−1(λ)=K∏k=1sm−1∏u=1(1−e−bλkm−1)vkm−1,u=K∏k=1(1−e−bλkm−1)vkm−1, (11)

where is the number of visits from group to state of stage and .

It is clear that if we let denote the time spent at a given state of Stage by a path from Group , then a censored observation at this point has the likelihood

 P(Zki>nb)=1−P(Zki≤nb)=1−si∑u=1pj,i,u,kP(Zki≤nb|move tou)=1−si∑u=1pj,i,u,k(1−e−nbλki)=e−nbλki. (12)

This shows that the effect of censoring at any stage is the same as the effect of staying times before moving, if there was no censoring. Thus, there is no reason to consider separately the censored observations. Therefore, the conditional likelihood function is obtained as in the following formula (the conditioning is on the observed moves, denoted by ):

 L(λ|y)=K∏k=1m−2∏i=0e−bλki+1wkim−2∏i=1(K∏k=1(λki(e−bλki+1−e−bλki)λki−λki+1)vki)K∏k=1(1−e−bλkm−1)vkm−1, (13)

or for equal average waiting times

 Le(λ|y)=(e−b∑Kk=1λk∑m−2i=0wki)(e−b∑Kk=1λk∑m−2i=1vkiK∏k=1(λk)∑m−2i=1vki)K∏k=1(1−e−bλk)vkm−1. (14)

The above derivations are obtained by multiplying over all paths of group using independence of paths. To obtain the Likelihood function for each group , we need to multiply over all possible states and stages, taking into consideration the lack of memory property of the exponential distribution. We need now to multiply by the transition probabilities of the states to obtain the full likelihood of the data:

 L(λ|y)=K∏k=1m−2∏i=0e−bλki+1wkim−2∏i=1(K∏k=1(λki(e−bλki+1−e−bλki)λki−λki+1)vki)K∏k=1(1−e−bλkm−1)vkm−1m−1∏i=1P(n)iki,k, (15)

giving for the case of equal average waiting times

 Le(λ|y)=(e−b∑Kk=1λk∑m−2i=0wki)(e−b∑Kk=1λk∑m−2i=1vkiK∏k=1(λk)∑m−2i=1vki)K∏k=1(1−e−bλk)vkm−1P(n)ikik. (16)

## 3 Maximum likelihood estimation

Considering the above likelihood function, it is clear that the maximum likelihood estimator (MLE) of the parameters exists and is unique. For the parameter , there is no closed form for the MLE but it can be estimated numerically provided that the data is given. Using as number of moves from state of stage for paths from group , the transition probabilities have estimators

 ^pku,i,j=nku,i,jnku,i. (17)

### 3.1 Distributional technicalities

In this section, we consider the probability distributions of some quantities of interest, such as the number of times a path is seen at a given location. We shall find the probability to reach a given state or find a match.

Notice that the distribution of (time spent by the path of group in Stage ) is not exponential. It is a more complex distribution because the move is not recorded directly after time . Given that we are already at stage , this time is , where have independent exponential distributions with common parameter ,

is a discrete random variable representing the number of time intervals the path was seen at this stage with values between 0 and n, and

is the indicator of removal of the individual from the study in Stage (equals 1 when no move is made to the next stage, and 0 otherwise). Let be the number of records at State of Stage for path from group , and thus given that the path has reached state of Stage . Note that the conditional distributions of these random variables do not depend on , since we have the same average waiting time across stages. It is clear that , where is the indicator of the path reaching state of Stage ( if the path reaches state of State , equals otherwise). Thus, the conditional distribution of this random variable is obtained in the following way (conditional on the path reaching Stage ).

 P(wki,j,ℓ=v|γij=1)=e−vbλk(1−e−bλk)(1−δℓ)I{0,...,n}(v)+δℓe−nbλk. (18)

Since the distribution of the quantities and do not depend on the specific or , we will henceforth simplify their notation and write and .

Let be the probability that a given path reaches stage and not move within the last units of time given that the path is seen for the first time at Stage . Since the probability of spending less than segments of units of time at Stage is (from formula (18) ), and the probability of getting to Stage and not moving within units of time is , by formula (2), we obtain

 Pk(i′+1|i′)=λk(1−e−bnλk)e−bλk. (19)

Thus, if we denote the probability that a given path from group is seen at stage one time, we have for

 Pk(i)=i−1∏i′=0Pk(i′+1|i′)=(λke−bλk(1−e−nbλk))i. (20)

If we denote - the Bernouili variable that indicates the path reaching stage , then the probability that the path reaches stage (i.e., the female picks a male) is

 P(Nkm−1=1)=Pk(m−1)=(λke−bλk(1−e−nbλk))m−2(1−ebnλk). (21)

Note that is not the same as at , and that the latter is defined only for . Reaching the final stage is different from other stages because we don’t need to consider time after reaching the final stage. The process stops when the the male is chosen. So, the obtained probability is the product of the probabilities to reach and not move within units of time and the probability of reaching the final stage after being seen on the previous stage.

Artificially setting (see (7), for the case when the path doesn’t reach stage , the probability that the path will be stopped prior to reaching stage (meaning not get a match in male choices) is

 P(wkm−1=−1)=1−(λke−bλk(1−e−nbλk))m−2(1−ebnλk). (22)

Combining the above equations for , we obtain,

 P(wki=v)=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩e−vbλk(1−e−bλk)(λke−bλk(1−e−nbλk))i,forv=0,1,⋯n−1,e−nbλk(λke−bλk(1−e−nbλk))i,forv=n,1−(λke−bλk(1−e−nbλk))iforv=−10,otherwise. (23)

### 3.2 Mean and variance of Nki

#### 3.2.1 Conditional mean and variance of Nki

Given that a path has reached stage , we need to find the conditional mean and variance ( and respectively) of the total number of records at stage , for . Using the distribution obtained in formula (23) and the fact that on the condition that the path is in Stage , , we get,

 Ei(Nki)= n−1∑u=0(u+1)e−ubλk(1−e−bλk)+(n+1)e−nbλk = n−1∑u=0ue−ubλk(1−e−bλk)+n−1∑u=0e−ubλk(1−e−bλk)+(n+1)e−nbλk = n−1∑u=0ue−ubλk(1−e−bλk)+1+ne−nbλk.

Notice that, if we set , then

 n−1∑u=0ue−ubλk(1−e−bλk)=−1bh′(λk)(1−e−bλk)=−(1−e−bλk)b∂∂λk(1−e−bnλk1−e−bλk).

Therefore, we obtain Working out the last equality leads to

 Ei(Nki)=1−e−b(n+1)λk1−e−bλk. (24)

To obtain the conditional variance, we need to compute . Using the same arguments that we have applied for , we get

 Ei((Nki)2)= n−1∑u=0(u+1)2e−ubλk(1−e−bλk)+(n+1)2e−nbλk = n−1∑u=0(u+1)e−ubλk(1−e−bλk)+(n+1)e−nbλk+ +n−1∑u=0u(u+1)e−ubλk(1−e−bλk)+n(n+1)e−nbλk.

It turns out that

 Ei((Nki)2)=Ei(Nki)−(1−e−bλk)b∂∂λk(Ei(Nki)−(n+1)e−nbλk1−e−bλk)+n(n+1)e−nbλk. (25)

 Ei((Nki)2)=1+e−bλk−(2n+3)e−(n+1)bλk+(2n+1)e−(n+2)bλk(1−e−bλk)2.

Therefore, the conditional variance is

 vari(Nki)=e−bλk(1−(2n+1)(e−nbλk−e−(n+1)bλk)−e−(2n+1)bλk)(1−e−bλk)2. (26)

One can check using L’Hopital’s rule that for , , and . This is the case when there is actually no possible move once the path gets to . The path in this case will certainly be recorded times at stage and the process will stop. On the other hand, as ,

converges in distribution to the geometric distribution with probability of success

.

#### 3.2.2 Mean and variance of Nki

Computing the mean by conditioning first on reaching stage and not moving within the first units of time, we obtain

 E(Nki)=Ei(Nki)⋅Pk(i)=1−e−b(n+1)λk1−e−bλk⋅(λke−bλk(1−e−nbλk))i, (27) E((Nki)2)=1+e−bλk−(2n+3)e−(n+1)bλk+(2n+1)e−(n+2)bλk(1−e−bλk)2⋅(λke−bλk(1−e−nbλk))i. (28)

The variance can therefore be obtained as .

These formulas can be used to make statistical inference about the number of stays at a given stage. It is important to notice that the mean depends on and . Thus, in a setup where inference on is known, one can choose the appropriate for a given target result.

### 3.3 Inference on the probability of reaching the final stage from a given position

As in (21), the probability to reach stage for a path from group is

 pk(m−1)=(λke−bλk(1−e−nbλk))m−2(1−ebnλk). (29)

Similarly, it is derived that the probability to reach the final stage from any given stage is

 pki(m−1)=(λke−bλk(1−e−nbλk))m−2−i(1−ebnλk).

Recall that the MLE of was discussed earlier using formula (16). The regularity conditions being satisfied, it is clear that the CLT holds for (MLE of ) in the form

 √Nk(^λk−λk)→N(0,1I(λk)), (30)

where is the Fisher information of . Taking into account the fact that (either or ) is a function of , it can be concluded using the Delta method that

 √Nk(pk(^λk)−pk)→N(0,(p′k(λk))2I(λk)), (31)

where is the MLE of that is obtained by plugging the MLE of into the formula of and is the derivative of with respect to . Similar result holds for the common for the case of equal across

groups. These results can be used to obtain approximate confidence intervals for the probability

’s.

For observations from group , let be the number of paths that reach the final stage. The random variable is the sum of

independent and identically distributed Bernoulli random variables, and has a binomial distribution with parameters

. From here on, we will use and for and when there is no possible ambiguity. From the above fact, it is clear that the total number of paths that reach the final stage is the sum of independent binomial random variables. As , the central limit theorem (CLT) holds in the form

 (√Nk(Nk(m)Nk−pk)√pk(1−pk),k=1,⋯K)→N(0–K,IK), (32)

where is the zero vector of size , is the identity matrix and

is the standard K-dimensional normal distribution.

This general form of the CLT can not be used to find the limiting distribution of . If we consider a balanced design, in which the same number of paths is selected from each of the groups (), then it follows that

 √N(N(m)N−K∑k=1pk)→N(0,K∑k=1pk(1−pk)). (33)

The central limit theorem (33) can be used to obtain inference, such as prediction intervals, for the total number of paths that reach the final stage, using the MLE of the parameters that are involved. To obtain results for reaching the final stage from a given position , it is enough to replace by in the formula and use the appropriate MLE.

### 3.4 Inference on reaching a particular final state u

It is clear that the probability of reaching the final state is a product of the probability of reaching the final stage and the probability of reaching the specific state from stage . The latest is the sum of the probabilities of reaching this state from each of the states of stage . Thus, if we denote the probability of reaching final state from a given state , we obtain

 pki(u)=sm−2∑j=1pkj,m−1,u(λke−bλk(1−e−nbλk))m−2−i(1−ebnλk). (34)

Maximum likelihood estimators of these probabilities are obtained by using the estimates from the above MLE of model parameters. These probabilities can be used to define test statistics for differences between the groups on one hand and to have inference on preferences or classification of the final states (males) for each of the groups (females). Using this method of comparison, we can conclude that the ranking of states (males) for group is that of the quantities

 Kuk=sm−2∑j=1pkj,m−1,u,u=1,⋯,sm−1. (35)

Recall that the likelihood estimators of are . Notice that these estimates are based on counts from multinomial distributions conditioned on the number of observations that have reached the states . Thus, the conditional CLT holds for these variables in the form

 (nkj,m−2,2,⋯,nj,m−2,sm−1)√nkm−2,j⟶N(Pk,Σk),Pk=(pkj,m−1,2,⋯,pkj,m−1,sm−1),andΣk=diagPk−PkTPk. (36)

#### 3.4.1 MLE inference for classification of males within each group of females

Considering The MLE of , we have

 ^Kuk=sm−2∑j=1nkj,m−2,unkm−2,j.

Using the fact that paths from different groups are independent and the sum of independent normal distributions is a normal distribution, we obtain

 ^Kuk−Kuk=sm−2∑j=1(nkj,m−2,unkm−2,j−Kuk)→sm−2∑j=11√nkm−2,jN(0,pkj,m−1,u(1−pkj,m−1,u))

Therefore, and

If we assume that with (this is a plausible assumption because the mean numbers the reach states are proportional to the number of observations in stage ), the it follows that

 √nkm−2(^Kuk−Kuk,u=1,⋯,sm−1)→N(0,sm−2∑j=11bkjΣk). (37)

#### 3.4.2 Substitute inference for classification of males within each group of females

Considering the estimators of as and setting , arguments similar to the above lead to the central limit theorem in the form

 √nkm−2(~Kuk,u=1,⋯,sm−1)⟶N(Kk