# Testing Preferential Domains Using Sampling

A preferential domain is a collection of sets of preferences which are linear orders over a set of alternatives. These domains have been studied extensively in social choice theory due to both its practical importance and theoretical elegance. Examples of some extensively studied preferential domains include single peaked, single crossing, Euclidean, etc. In this paper, we study the sample complexity of testing whether a given preference profile is close to some specific domain. We consider two notions of closeness: (a) closeness via preferences, and (b) closeness via alternatives. We further explore the effect of assuming that the outlier preferences/alternatives to be random (instead of arbitrary) on the sample complexity of the testing problem. In most cases, we show that the above testing problem can be solved with high probability for all commonly used domains by observing only a small number of samples (independent of the number of preferences, n, and often the number of alternatives, m). In the remaining few cases, we prove either impossibility results or Ω(n) lower bound on the sample complexity. We complement our theoretical findings with extensive simulations to figure out the actual constant factors of our asymptotic sample complexity bounds.

• 28 publications
• 14 publications
• 4 publications
07/06/2019

### Towards Testing Monotonicity of Distributions Over General Posets

In this work, we consider the sample complexity required for testing the...
09/23/2022

### The complexity of unsupervised learning of lexicographic preferences

This paper considers the task of learning users' preferences on a combin...
11/30/2017

### Testing Conditional Independence of Discrete Distributions

We study the problem of testing conditional independence for discrete di...
10/15/2018

### Small One-Dimensional Euclidean Preference Profiles

We characterize one-dimensional Euclidean preference profiles with a sma...
04/18/2016

### Learning Sparse Additive Models with Interactions in High Dimensions

A function f: R^d →R is referred to as a Sparse Additive Model (SPAM), i...
05/01/2021

### Generalized Kings and Single-Elimination Winners in Random Tournaments

Tournaments can be used to model a variety of practical scenarios includ...
08/17/2022

### Information Loss in Euclidean Preference Models

Spatial models of preference, in the form of vector embeddings, are lear...

## 1. Introduction

Learning users’ preferences is useful in the contexts of social choice, recommender systems, product development, and many more applications. It is often observed that preferences are never completely arbitrary, rather they possess correlated structures Gaertner (2001). For example, preferences of citizens for a facility location have a single peaked structure (Filos-Ratsikas et al., 2017, Section 1), i.e., a citizen has highest preference for the facility at her location and it monotonically decreases with the distance from her. This kind of preferences are also prevalent in political opinions based on the voters’ bias to the conservative or liberal views Hinich and Munger (1997). Intuitively, in a single peaked preference profile, we assume that there exists a societal axis where the alternatives have been ordered and every preference “respects” that ordering in the following sense. Every preference has an implicit most preferred point on the societal axis and if an alternative lies between and another alternative , then is preferred over . The advantage of preferences with such structures is that they can efficiently bypass the classic impossibility results of social choice theory Arrow (1950); Gibbard (1973); Satterthwaite (1975).

Similarly, in the design of recommender systems, it has often been observed that users’ preferences (and hence their recommendations) have patterns that are (a) demography-based, (b) knowledge-based, (c) feature-based, or (d) content based Pennock et al. (2000). While designing a product, an enterprise may wish to look for structures in the end users’ preferences, and design their product such that a collectively ‘efficient’ choice is made to cater a large number of users.

While it is difficult to predict the users’ preferences apriori, data on the preferences, obtained through users’ purchase and browsing patterns, or through surveys, are plentiful which are classified into demography, knowledge, affinity towards a feature or content. It remains to discover whether the preferences come from a specific class that we call

preferential domains or simply, domains.

A domain is a collection of sets of preferences over a set of alternatives. A preference profile, i.e., the tuple of preferences of all the agents/users, is said to belong to a domain if, for some set in the domain, every preference in the profile belongs to that set.

###### Example (Single peaked domain)

Consider three alternatives . The single peaked domain with these alternatives is denoted by , where when the societal order over the alternatives is , and similarly, are the sets of preferences over the same alternatives for different societal orders of , and .

Some prominent examples of domains are single peaked, single crossing, Euclidean, Gaertner (2001) etc. The benefit of the discovery of such domains (even as a partial population) is that a much refined plan or protocol can be designed for such domains which satisfy several desirable axioms. For example, the median voting rule in the single peaked domain ensures that no voter can gain by misreporting her preference Moulin (1991). Another reason to study various domains concerns computational considerations. Indeed, some of the most fundamental problems in computational social choice, for example, computing winners for many important voting rules such as Kemeny, Dodgson, and Young are computationally intractable Brandt et al. (2016). It turns out that most of these problems become efficiently solvable in many domains, single peaked for example Brandt et al. (2015).

Our work in this paper contributes to uncovering whether a given preference profile is “close” to some domain, through sampling a small number of preferences and/or alternatives. The guarantees we provide are probabilistic that converges to unity as more preferences/alternatives are investigated – the cost of such an investigation is often proportional to the number of samples drawn, known as sample complexity. Hence our goal is to minimize the sample complexity of our algorithms. For example, our algorithms could be used to predict whether there exist at least, say 95%, of the preferences in a profile which are single peaked. If we know the societal order of the single peaked preferences (which constitute at least 95% of the profile), using median voting rule on the single peaked sub-profile would yield all the desirable properties of the median voting rule, e.g., truthfulness for those 95% of the population. These kind of truthfulness of a fraction of voters is referred to as “approximate truthfulness.” In many applications like public good provisioning, it is highly beneficial to uncover truthful opinions from the vast majority of the population.

To put our work in perspective, we revisit a question that is often asked in computational social choice for any domain. This is about the existence of an efficient recognition algorithm: given a profile , does there exist a polynomial time algorithm to decide whether belongs to the domain? There exist efficient recognition algorithms for many popular domains, for example, single peaked Bartholdi III and Trick (1986), single crossing Doignon and Falmagne (1994), etc.  Knoblauch (2010); Elkind and Faliszewski (2014); Elkind et al. (2015); Magiera and Faliszewski (2017). One notable exception is the Euclidean domain of dimension two where the recognition problem is -hard Peters (2017).

There are two main limitations of the recognition problem. First, the problem formulation is “exact.” Real world profiles are almost never perfect and thus they can only be at most “close” to some domain. More specifically, there may be few preferences or alternatives (treated as outliers) whom we need to ignore to obtain the required structure. Unfortunately, outliers’ consideration often makes the related recognition problem intractable, (e.g., the voter deletion for single peaked domain Erdélyi et al. (2017)). Second, the recognition problem needs access to the entire preference profile. In many situations, e.g., pre-election polls, surveys, etc., we only have access to samples. In other cases, the number of preferences may be too large and, depending on the application at hand, a sub-linear time (possibly approximation) algorithm may be more useful. We address both these issues by defining a related testing problem. As a concrete use case, a social planner could use our testing algorithms to know whether it is possible to remove, say 5% of the preferences to obtain a single peaked structure by observing a small number of samples.

A corresponding computational problem is: can a profile of preferences over alternatives belong to some domain after deleting, say at most preferences (or alternatives), by drawing a small number of samples? However, any algorithm for this problem would need to observe samples which defeats the main purpose of testing (except when is empty or contains all possible profiles). To see this, let us consider a specific case of to be single peaked; the set of alternatives be . Let be a profile consisting of (say is an even integer) copies of , copies of , and one . We observe that is not single peaked after observing the last preference . However, deletion of that preference makes it single peaked. Let us now consider another profile consisting of copies of , copies of , and two copies of . Again, is not single peaked, but deletion of the two copies of makes it single peaked. We now observe that the KL-divergence Kullback and Leibler (1951) between the two distributions of samples for and is and thus distinguishing from (which any testing algorithm has to do) requires samples to succeed with any constant nonzero probability Bar-Yossef (2003). To overcome this lower bound, we introduce (as is ubiquitous in testing literature Ron (2001); Goldreich (1999)) a “gap” in the two possible inputs. In all our testing problems, we are given a profile as input which is guaranteed to be one of the two possible types, and we need to find which one it is. The two possibilities for the input will cover all the cases except few and thus there is a “gap.”

### 1.1. Our Contribution

Our specific contribution in this paper are as follows. The error probability of any algorithm below is at most .

1. We present a sampling based algorithm to distinguish any profile for which there exists a set of at most preferences (or alternatives) whose deletion makes the resulting profile belong to from any random profile (refer to the first three rows in Table 1). We observe that the sample complexity depends on whether we assume to be arbitrary or random. We remark that, in the testing literature Goldreich et al. (1998); Andrews et al. (1998); Shao (2011), it is popular to assume the noise to be random which is equivalent to assuming the preferences in to be random in our context.

2. For any , we present a sampling based algorithm to distinguish any profile for which there exist at most preferences whose deletion makes the resulting profile belong to from any profile where one has to delete at least preferences to make it belong to (refer to the fourth row in Table 1).

3. In the case of alternatives, we prove that any algorithm for distinguishing any profile for which there exist at most alternatives whose deletion makes the resulting profile belong to from any profile where one has to delete at least alternatives to make it belong to has sample complexity of for every even when (refer to the fifth row in Table 1). This shows that detecting arbitrary outlier alternatives is much harder than detecting arbitrary outlier preferences from a sample complexity viewpoint.

We remark that all our results in Table 1 for the single peaked domain actually extend to any domain as described in Section 3. From a technical point of view, to tackle preferences which are outliers, we define and exploit a notion called content of a domain which, informally, is the maximum number of distinct preferences that any profile in the domain can contain as a function of the number of alternatives. On the other hand, we blend with it the ideas from the classical coupon collector problem to handle alternatives which are outliers. To develop an algorithm for the case when the outliers can be arbitrary, we prove a key structural result (in Section 3.1) for arbitrary domain which may be of independent interest also.

### 1.2. Related Work

The computational problem of recognizing whether a given profile belongs to a domain has been studied extensively in computational social choice. Trick Bartholdi III and Trick (1986) shows that the recognition problem is polynomial time solvable for single peaked profiles. Escoffier et al. Escoffier et al. (2008) improve the efficiency of the recognition algorithm for the single peaked profiles. Elkind et al. Elkind et al. (2012) present a polynomial time algorithm for recognizing single crossing profiles. Barberà and Moreno Barberà and Moreno (2011) discover a property called top monotonicity which simultaneously generalizes both single peakedness and single crossingness. Magiera and Faliszewski Magiera and Faliszewski (2017) present polynomial time recognition algorithm for top monotonic profiles. Doignon and Falmagne Doignon and Falmagne (1994) show that the recognition problem for the one dimensional Euclidean domain is polynomial time solvable. Knoblauch Knoblauch (2010) and Elkind and Faliszewski Elkind and Faliszewski (2014) present alternative algorithms for recognizing one dimensional Euclidean profiles. Peters Peters (2017) shows that recognizing Euclidean profiles of dimension at least two is -hard.

Lackner Lackner (2014) shows that the computational problem of finding if it is possible to extend a given incomplete profile to a single peaked profile is -complete. However, if we restrict ourselves to only weak orders, then the computational problem of recognizing incomplete single peaked profiles is polynomial time solvable Fitzsimmons (2015). The above problem is polynomial time solvable for single crossing profiles too Elkind et al. (2015). Erdélyi Erdélyi et al. (2017) studies complexity of the computational problem of deciding whether a given profile can be “made” single peaked by deleting few preferences or alternatives; Bredereck et al. Bredereck et al. (2016) study complexity of this problem for single peaked, single-caved, single-crossing, etc. profiles. Ballester and Haeringer Ballester and Haeringer (2011) present characterization of single peaked profiles through succinct forbidden configurations. Bredereck et al. Bredereck et al. (2013) show forbidden configurations for the single crossing profiles. Elkind et al. Elkind et al. (2014) present forbidden configurations for profiles which are simultaneously single peaked and single crossing. A related literature studies the likelihood of a random profile being single peaked Lackner and Lackner (2017); Chen and Finnendahl (2018); Chatterji et al. (2016).

## 2. Preliminaries and Problem Formulation

For any two positive integers and with , we denote the set by and the set by . For a set , we denote its power set by . Let be a finite set of alternatives of cardinality . Preferences are linear orders over . We denote the set of all linear orders over by . For any positive integer , a tuple of preferences is called a profile. If not mentioned otherwise, we use and to denote the number of alternatives, the number of preferences in a profile, and the set of alternatives, respectively. For a subset and a preference , we denote the restriction of to by . A preferential domain or simply domain is a collection of subsets of . We call a domain nontrivial if and . Given a domain and a profile over , we say (with slight abuse of notation) that if there exists a such that . We call a domain neutral if whenever , we have for every permutation of ; if is defined as , then is defined as . We call a domain normal if whenever , we have for every . In this work, we consider only neutral and normal domains. We remark that many popular domains including single peaked, single caved, single crossing, top restricted, bottom restricted, etc. satisfy these two properties (the only notable exception is the domain of top monotonic Barberà and Moreno (2011) profiles).

Let be any domain and be a profile. If it satisfies the following conditions:

1. there exists a subset such that and , and

2. for every subset such that , we have ,

then we say that the preference-distance of from is , and we call the preferences which need to be deleted to bring the profile back to to be preference outliers. Similarly, we can define the notion of alternative-distance (where only alternatives need to be deleted) and alternative outliers.

Our first problem is to distinguish a profile which is, informally speaking, alternatives and random preferences away from some domain vs a random profile. We call this problem (, , , ) – Random Outliers vs Random Profile Test which is formally defined as follows.

###### Problem 1 ((εv, εa, δ, D) – Random Outliers vs Random Profile Test).
Let be a profile over a set of alternatives which is either one of the following kind: There exists and with and such that the profile belongs to the domain and is distributed uniformly in for every . The preference is distributed uniformly randomly in for every . Output if the input profile is of the first kind and if it is of the second kind; the probability of error can be at most .

Problem 1 assumes that the preference outliers are distributed uniformly randomly which can be a strong assumption depending on the application at hand. The (, , , ) – Arbitrary Outliers vs Random Profile Test problem in Problem 2 removes this assumption.

###### Problem 2 ((εv, εa, δ, D) – Arbitrary Outliers vs Random Profile Test).
Let be a profile over a set of alternatives which is either one of the following: There exists and with and such that the profile belongs to the domain . The preference is distributed uniformly randomly in for every . Output if the input profile is of the first kind and if it is of the second kind; the probability of error can be at most .

Problem 2 still retains the assumption from Problem 1 that the second possibility for the input profile is random. The (, , , , , ) – Arbitrary Outliers vs Arbitrary Profile Test problem in Problem 3 is the most general problem in our paper which removes all these structural assumptions from Problems 2 and 1.

###### Problem 3 ((εv, εa, ε′v, ε′a, δ, D) – Arbitrary Outliers vs Arbitrary Profile Test).
Let be a profile over a set of alternatives which is either one of the following kind where and : There exists and with and such that the profile belongs to the domain . For every and with and , the profile does not belong to the domain . Output if the input profile is of the first kind and if it is of the second kind; the probability of error can be at most .

In Problems 3, 2 and 1, the error probability is taken over the randomness used in generating the instances in (ii) and the randomness used by the algorithm.

### 2.1. Content and Residue of Domain

We now define the content and residue of any domain which will make the many of our results simpler to state. Let be any domain. We define the content of as a function such that any profile with alternatives in can have at most distinct preferences; we call the function defined as the residue of a domain. For example, ,  Dey and Misra (2016). For technical reason, let us assume that for every . We observe that, for normal domains, the function is non-increasing (and thus is a non-decreasing function). Whenever the domain is immediate from the context, we omit from subscript of con and res.

### 2.2. Sampling Model and Sample Complexity

In our model, there is an oracle which, when queried, returns an agent picked uniformly randomly with replacement from the set of all agents. Now the algorithm can ask the agent an arbitrary number of comparison queries – in a comparison query, two alternatives and are presented to the agent and it replies whether it prefers over or over . The sample complexity of an algorithm is defined to be the total number of comparison queries it makes during its execution. We remark that defining sample complexity (instead of the number of agents sampled) as the number of comparison queries enables us to perform more fine grained analysis of the complexity of our problems.

### 2.3. Chernoff Bound

We repeatedly use the following concentration inequality: Let be a sequence of

independent random variables in

(not necessarily identical). Let and let . Then, for any :

 Pr[|S−μ|⩾δℓ]<2exp(−2ℓδ2), (1)

and

 Pr[|S−μ|⩾δμ]<2exp(−δ2μ/3). (2)

Equations 2 and 1 are called additive and multiplicative versions of the bound respectively.

## 3. Results

We now present our main results. Our general approach would be to explain our algorithms for the special case of the single peaked domain first and then generalize to arbitrary domain; we make an exception for few cases where presenting the general case directly better reveals the key idea. In the interest of space, we omit some of our proofs, which can be found in the supplemental material. For ease of exposition and interest of space, we have deferred our more involved algorithms for the cases when both preferences and alternatives could simultaneously be outliers to the supplemental material.

### 3.1. Only Preferences as Outliers

In this subsection, we focus on the case when only preferences are considered as outliers. We begin with presenting our (, , , ) – Random Outliers vs Random Profile Tester for the single peaked domain. Our algorithm first fixes any three alternatives, say and . Then it samples few preferences restricted to these three alternatives only. If all the six possible permutations of and appear nearly same number of times, then the algorithm predicts the profile to be a random profile; otherwise it predicts it to be close to single peaked. We now formally present our algorithm in Section 3.1.

For at least alternatives, there exists a (, , , single peak) – Random Outliers vs Random Profile Tester with sample complexity for every and . If there are only alternatives, then there does not exist any such tester.

###### Proof.

For , the result follows from the observation that a profile where every preference is distributed uniformly in the set of all possible preferences is single peaked and thus the two cases are statistically indistinguishable. So let us assume and and be any three alternatives. We pick preferences uniformly at random with replacement and query oracle to know how and are ordered in these preferences. Let , be all possible permutations of and be the random variable denoting the number of sampled preferences where the permutation appears for . We output if and output otherwise. We observe that the sample complexity of our algorithm is . We now turn to the correctness of our algorithm. For that we show that irrespective of the input profile, the probability of making an error is at most .

• [leftmargin=0cm,itemindent=0.3cm,labelwidth=labelsep=0cm,align=left,noitemsep,topsep=2pt]

• Case I - the input profile is single peaked after deleting at most preferences which are distributed uniformly: Let be the input profile and be a sub-profile of which is single peaked and contains at least preferences. Hence, there exists an such that the preference does not appear in . Since the preferences in

, we have . Using Chernoff bound (additive form), we now have the following:

 Pr[error]⩽Pr[Xj⩾ℓ12(1+εv)]⩽exp{−ℓ(1−εv)272}⩽δ
• Case II - the input profile is distributed uniformly: Since every preference in profile is uniformly distributed, for every , we have . Using Chernoff bound (multiplicative form) followed by union bound, we have the following:

 Pr[error] =Pr[∃i∈[6],Xi⩽ℓ12(1+εv)] ⩽6exp{−\nicefrac(1−εv)2ℓ48}⩽δ\qed

The main idea in Section 3.1 can be easily extended to arbitrary domains.

###### Corollary

Let be any normal and neutral domain and . For at least alternatives, there exists a (, , , ) – Random Outliers vs Random Profile Tester with sample complexity for every and . If the number of alternatives is at most , then there does not exist any such tester.

We now turn our attention to the (, , , ) – Arbitrary Outliers vs Random Profile Test problem; that is when the outliers can be arbitrary (need not be randomly generated). We begin with presenting a general impossibility result in this case. Its proof follows from the observation that, in this case, one can carefully construct the set of outliers so that the distribution of samples in both the possibilities are statistically indistinguishable.

###### Proposition

For every domain , there does not exist any (, , , ) – Arbitrary Outliers vs Random Profile Tester for any where is the number of alternatives in the input profile.

We now present our (, , , single peak) – Arbitrary Outliers vs Random Profile Tester for in Section 3.1. We defer our general (, , , ) – Arbitrary Outliers vs Random Profile Tester till Section 3.1 which not only handles every but also takes care of arbitrary domain (but the sample complexity will be worse than that of Section 3.1). The main idea of the algorithm in Section 3.1 is exactly the same as the algorithm in Section 3.1 – it samples some preferences restricted to any alternatives and outputs that the profile is random if all the possible permutations appear nearly equal number of times; otherwise it says that the profile is close to single peaked.

There exists a (, , , single peak) – Arbitrary Outliers vs Random Profile Tester with sample complexity for every .

###### Proof.

As in Section 3.1, we choose any alternatives and , pick preferences uniformly at random with replacement, and query oracle to know how and are ordered in these preferences. We output if and output otherwise (with notation as defined in the proof of Section 3.1). The proof of correctness and the analysis of the sample complexity of our algorithm is similar to Section 3.1 using the observation that, when the input profile can be made single peaked by deleting at most preferences, there exists an such that since . ∎

From the proof of Section 3.1, the following generalization to arbitrary domain is immediate.

###### Corollary

Let and the number of alternatives is at least . Then there exists a (, , , ) – Arbitrary Outliers vs Random Profile Tester with sample complexity for every with (the notation in the sample complexity hides constant which depends on ).

We now present our (, , , ) – Arbitrary Outliers vs Random Profile Tester for any generalizing Section 3.1. Of course we need the number of alternatives to be at least where due to Section 3.1.

Given a domain , any with with , there exists a (, , , ) – Arbitrary Outliers vs Random Profile Tester with sample complexity where .

###### Proof.

Let be any subset of alternatives with . We pick preferences uniformly at random and elicit these preferences restricted to . For , let be the random variable denoting the number of sampled preferences which are the same as . We output if and output otherwise. The sample complexity complexity of the algorithm is . The proof of correctness of our algorithm is similar to that of Section 3.1 using the observation that, when the input profile can be made single peaked by deleting at most preferences, there exists an such that (follows from the definition of ). ∎

We now present our result for the (, , , , , ) – Arbitrary Outliers vs Arbitrary Profile Test problem. The following structural result provides the key building block of our algorithm. Intuitively the lemma proves that, given a profile , if we sample preferences from uniformly at random with replacement to construct another profile (of certain size), then the “relative” distance of from any domain is approximately same as the relative distance of from .

###### Lemma

Let be any normal and neutral domain and be a profile with preference-distance being from . Let , , and be a profile where has been picked uniformly at random with replacement from the preferences of . Then the preference-distance of from is at least and at most with probability at least for every .

We now present our (, , , , , ) – Arbitrary Outliers vs Arbitrary Profile Tester. The high level idea is to sample some number of preferences, compute the distance of the resulting profile from the single peaked domain, and output the distance of the original profile to be if and only if is closer to than .

For every domain , there exists a (, , , , , ) – Arbitrary Outliers vs Arbitrary Profile Tester with sample complexity for every and .

We observe that  (Escoffier et al., 2008, Lemma 2) and . Hence, from Section 3.1, we obtain the following result for the single peaked and single crossing domains.

###### Corollary

There exists a (, , , , , ) – Arbitrary Outliers vs Arbitrary Profile Tester with sample complexity for the single peaked domain and with sample complexity for the single crossing domain for every and .

### 3.2. Only Alternatives as Outliers

In this subsection, we now focus on the case when only alternatives are considered as outliers. We observe that when only alternatives act as outliers, the (, , , ) – Random Outliers vs Random Profile Test and (, , , ) – Arbitrary Outliers vs Random Profile Test are the same problem. We begin with presenting our (, , , single peak) – Random Outliers vs Random Profile Tester in Section 3.2 below. On a high level, our algorithm in Section 3.2 samples some number of preferences restricted to some number of alternatives. If for every alternatives among those alternatives, all the possible permutations appear in the sampled preferences, then the algorithm outputs the profile to be random, otherwise it says that the profile is close to being single peaked.

There exists a (, , , single peak) – Random Outliers vs Random Profile Tester with sample complexity . Hence, there also exists a (, , , single peak) – Arbitrary Outliers vs Random Profile Tester with the same sample complexity for every and such that .

###### Proof.

We sample alternatives uniformly at random without replacement. Let be the set of sampled alternatives. We now sample preferences uniformly at random with replacement restricted to . Let be the set of sampled preferences. We output if there exist alternatives such that at least one permutation in is not present in and output otherwise. We observe that the sample complexity of our algorithm is . We now turn to the correctness of our algorithm. For that we show that irrespective of the input profile, the probability of making an error is at most .

• [leftmargin=0cm,itemindent=0.3cm,labelwidth=labelsep=0cm,align=left,noitemsep,topsep=2pt]

• Case I - the input profile is single peaked after deleting at most alternatives: Let be the set of alternatives and with such that the input profile restricted to is single peaked. Then we have the following for the chosen value of :

 Pr[error] ⩽Pr[|B∩W|⩾ℓ−2] =εℓa+(ℓ1)(1−εa)εℓ−1a+(ℓ2)(1−εa)2εℓ−2a ⩽εℓa+ℓεℓ−1a+ℓ2εℓ−2a⩽δ
• Case II - the input profile has been generated uniformly at random: For any alternatives , we define a random variable to be if all possible permutations in are present in and otherwise. Using folklore tail bound for the coupon collector problem (for example, see (Motwani and Raghavan, 2010, Chap 3.6)), we obtain the following for the chosen value of .

 Pr[X{a,b,c}=0]⩽6−\nicefract3ln6⩽e−\nicefract6

Now using union bound, we obtain the following for the chosen values of and .

 Pr[error] ⩽Pr[∃{a,b,c}⊂B,X{a,b,c}=0]⩽(ℓ3)e−t6⩽δ\qed

From the proof of Section 3.2, Section 3.2 follows.

###### Corollary

For every domain , there exists a (, , , ) – Arbitrary Outliers vs Random Profile Tester with sample complexity for every and such that .

We show below that the condition in Sections 3.2 and 3.2 is necessary. We prove Section 3.2 by carefully constructing a set of outliers such the the sample distribution in both the possibilities are statistically indistinguishable.

###### Proposition

For every domain , there does not exist any (, , , ) – Arbitrary Outliers vs Random Profile Tester if .

We now turn to the (, , , , , single peak) – Arbitrary Outliers vs Arbitrary Profile Test problem. The following results show that the sample complexity of this problem is even for the single peaked and single crossing domains.

Any (, , , , , single peak) – Arbitrary Outliers vs Arbitrary Profile Tester has sample complexity for every and such that .

Any (, , , , , ) – Arbitrary Outliers vs Arbitrary Profile Tester has sample complexity for single crossing domain for every and such that .

## 4. Empirical evaluation

The algorithms presented in Section 3

provide upper bounds on the sample complexities of the problems of outlier detection. These algorithms distinguish between two possibilities of profile generation with a probability of correctness of at least

. It is interesting to find out the optimal multiplying factors of the sampling complexities inside in these algorithms. This is why an empirical evaluation is called for.

In this section, we empirically find the factors for the results of Sections 3.2, 3.1 and 3.1, which provide constant time algorithms for the testing problem. The other two cases as shown in Table 1 either consider an exponential time (Section 3.1) algorithm or provide a lower bound (Section 3.2), which are unsuitable for an empirical study.

### 4.1. Approach for Sections 3.1 and 3.1:

We generate preferences with alternatives uniformly at random to form a preference profile. The sampling algorithm of Section 3.1 (given by Algorithm 1) picks an for a given . In this experiment, we choose a sampling size that is smaller than , and apply the same algorithm using preferences sampled with replacement from the population of . We generate the preference profile times and for every profile, sample preferences times. We consider the fraction of correct classifications given by this modified sampling algorithm and plot it with increasing . We fix for these evaluations. We show the plot of the fraction of correct classification (denoted by ) for Section 3.1 with in Figure 1.

The plot shows the growth of the empirical probability of correctness (and therefore does not need any errorbar). The x-axis shows the normalized sample size (that is ). Notice that the growth of the curves almost overlaps for different s, and reaches nearly at . This empirically shows that when other parameters are held fixed at the chosen values, the hidden constant in the upper bound of the sample complexity in the context of random outliers can be reduced by almost 50%, and is independent of .

We perform a similar exercise with different sampling sizes for the algorithm in the proof of Section 3.1 (given by Algorithm 2) with in Figure 2. Here too, the proportionality factor is independent of the s, and the hidden constant factor in this case can be reduced by 60%.

#### Why the error with a random/arbitrary outliers profile being classified as a random profile is not considered?

We argue that such an error is not very likely in the algorithms of these theorems, which is also manifested in our simulations. Therefore we omit them presenting here. For Section 3.1, since the focus is only on the three alternatives , and , the number of random outliers will be close to for large enough . If preferences are drawn uniformly at random with replacement from this profile, it is very likely that will be at most close to for reasonably sized . The algorithm classifies the profile as random outlier profile if and since , it is unlikely that a random outlier profile will be classified as random profile under this algorithm. Similar observation is true for Section 3.1.

### 4.2. Approach for Section 3.2:

Here we consider the alternatives as outliers. The algorithm in the proof of this theorem (given by Algorithm 3) samples alternatives uniformly at random and samples preferences restricted to the sampled alternatives uniformly at random. In this case, we pick the values of and as before. We fix , and pick as given in the proof of Section 3.2, and vary the value of , which is the sampling size of the preferences restricted to the chosen alternatives. The alternatives of size are sampled 100 times. Figure 3 shows the plot of the fraction of correct classification () under this setting. It empirically shows that when other parameters are held fixed at the chosen values, the hidden constant of the upper bound of the probability in the case of random alternative outliers can be reduced by almost 75%, and is independent of .

In a way similar to the previous paragraph we can argue that this algorithm also has a bias towards classifying a profile as random alternative outlier, which also is empirically manifested. Hence, we omit presenting them here.

## 5. Discussion

In this paper, we have developed sampling based algorithms for testing if a profile is close to some specific domain. These testing problem can be quite accurately solved by observing a small number of samples for most of the cases, and the numbers are often independent to the number of preferences or alternatives. In other cases, we have proved impossibility results. Our extensive empirical study further improve the constants of the asymptotic theoretical upper bounds on the sample complexity by 50% to 75% depending on the problem. As a future work, there exist more sophisticated notion of distances, namely swap distance, footrule distance, maximum displacement distance, etc. where it will be interesting to extend our results to those fine grained measures of distance.

## References

• (1)
• Andrews et al. (1998) Donald WK Andrews, Xuemei Liu, and Werner Ploberger. 1998.

Tests for white noise against alternatives with both seasonal and nonseasonal serial correlation.

Biometrika 85, 3 (1998), 727–740.
• Arrow (1950) Kenneth J Arrow. 1950. A difficulty in the concept of social welfare. J. Polit. Econ. (1950), 328–346.
• Ballester and Haeringer (2011) Miguel A Ballester and Guillaume Haeringer. 2011. A characterization of the single-peaked domain. Soc. Choice Welf. 36, 2 (2011), 305–322.
• Bar-Yossef (2003) Ziv Bar-Yossef. 2003. Sampling lower bounds via information theory. In

Proc. 35th Annual ACM Symposium on Theory of Computing (STOC)

. 335–344.
• Barberà and Moreno (2011) Salvador Barberà and Bernardo Moreno. 2011. Top monotonicity: A common root for single peakedness, single crossing and the median voter result. Games Econ. Behav. 73, 2 (2011), 345–359.
• Bartholdi III and Trick (1986) John Bartholdi III and Michael A Trick. 1986. Stable matching with preferences derived from a psychological model. Oper. Res. Lett. 5, 4 (1986), 165–169.
• Brandt et al. (2015) Felix Brandt, Markus Brill, Edith Hemaspaandra, and Lane A Hemaspaandra. 2015. Bypassing combinatorial protections: Polynomial-time algorithms for single-peaked electorates. J. Artif. Intell. Res. (2015), 439–496.
• Brandt et al. (2016) Felix Brandt, Vincent Conitzer, Ulle Endriss, Jérôme Lang, and Ariel D Procaccia. 2016. Handbook of Computational Social Choice. Cambridge University Press.
• Bredereck et al. (2013) Robert Bredereck, Jiehua Chen, and Gerhard J Woeginger. 2013. A characterization of the single-crossing domain. Soc. Choice Welf. 41, 4 (2013), 989–998.
• Bredereck et al. (2016) Robert Bredereck, Jiehua Chen, and Gerhard J. Woeginger. 2016. Are there any nicely structured preference profiles nearby? Math. Soc. Sci. 79 (2016), 61–73.
• Chatterji et al. (2016) Shurojit Chatterji, Arunava Sen, and Huaxia Zeng. 2016. A characterization of single-peaked preferences via random social choice functions. Theor. Econ. 11, 2 (2016), 711–733.
• Chen and Finnendahl (2018) Jiehua Chen and Ugo Paavo Finnendahl. 2018. On the number of single-peaked narcissistic or single-crossing narcissistic preference profiles. Discrete Math. 341, 5 (2018), 1225–1236.
• Dey and Misra (2016) Palash Dey and Neeldhara Misra. 2016. Preference Elicitation for Single Crossing Domain. In

Proc. 25th International Joint Conference on Artificial Intelligence (IJCAI)

. 222–228.
• Doignon and Falmagne (1994) Jean-Paul Doignon and Jean-Claude Falmagne. 1994. A Polynomial Time Algorithm for Unidimensional Unfolding Representations. J. Algorithms 16, 2 (1994), 218–233.
• Elkind and Faliszewski (2014) Edith Elkind and Piotr Faliszewski. 2014. Recognizing 1-Euclidean Preferences: An Alternative Approach. In

Proc. 7th International Symposium on Algorithmic Game Theory (SAGT)

. 146–157.
• Elkind et al. (2015) Edith Elkind, Piotr Faliszewski, Martin Lackner, and Svetlana Obraztsova. 2015. The Complexity of Recognizing Incomplete Single-Crossing Preferences. In Proc. 29th AAAI Conference on Artificial Intelligence (AAAI). 865–871.
• Elkind et al. (2014) Edith Elkind, Piotr Faliszewski, and Piotr Skowron. 2014. A Characterization of the Single-Peaked Single-Crossing Domain. In Proc. 28th AAAI Conference on Artificial Intelligence (AAAI). 654–660.
• Elkind et al. (2012) Edith Elkind, Piotr Faliszewski, and Arkadii M. Slinko. 2012. Clone structures in voters’ preferences. In Proc. 13th ACM Conference on Electronic Commerce (EC). 496–513.
• Erdélyi et al. (2017) Gábor Erdélyi, Martin Lackner, and Andreas Pfandler. 2017. Computational Aspects of Nearly Single-Peaked Electorates. J. Artif. Intell. Res. 58 (2017), 297–337.
• Escoffier et al. (2008) Bruno Escoffier, Jérôme Lang, and Meltem Öztürk. 2008. Single-peaked consistency and its complexity. In Proc. 18th European Conference on Artificial Intelligence (ECAI). 366–370.
• Filos-Ratsikas et al. (2017) Aris Filos-Ratsikas, Minming Li, Jie Zhang, and Qiang Zhang. 2017. Facility Location with Double-peaked Preferences. Auton. Agents Multi Agent Syst 31, 6 (2017), 1209–1235.
• Fitzsimmons (2015) Zack Fitzsimmons. 2015. Single-Peaked Consistency for Weak Orders Is Easy. In Proc. 15th Conference on Theoretical Aspects of Rationality and Knowledge (TARK). 127–140.
• Gaertner (2001) Wulf Gaertner. 2001. Domain Conditions in Social Choice Theory. Cambridge University Press.
• Gibbard (1973) A. Gibbard. 1973. Manipulation of Voting Schemes: a General Result. Econometrica (1973), 587–601.
• Goldreich (1999) Oded Goldreich. 1999. Combinatorial property testing (a survey). Randomization Methods in Algorithm Design 43 (1999), 45–59.
• Goldreich et al. (1998) Oded Goldreich, Shafi Goldwasser, and Dana Ron. 1998. Property Testing and its Connection to Learning and Approximation. J. ACM 45, 4 (1998), 653–750.
• Hinich and Munger (1997) Melvin J Hinich and Michael C Munger. 1997. Analytical politics. Cambridge University Press.
• Knoblauch (2010) Vicki Knoblauch. 2010. Recognizing one-dimensional Euclidean preference profiles. J. Math. Econ. 46, 1 (2010), 1–5.
• Kullback and Leibler (1951) Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics 22, 1 (1951), 79–86.
• Lackner (2014) Martin Lackner. 2014. Incomplete Preferences in Single-Peaked Electorates. In Proc. 28th AAAI Conference on Artificial Intelligence (AAAI). 742–748.
• Lackner and Lackner (2017) Marie-Louise Lackner and Martin Lackner. 2017. On the likelihood of single-peaked preferences. Soc. Choice Welf. 48, 4 (2017), 717–745.
• Magiera and Faliszewski (2017) Krzysztof Magiera and Piotr Faliszewski. 2017. Recognizing Top-Monotonic Preference Profiles in Polynomial Time. In Proc. 26th International Joint Conference on Artificial Intelligence (IJCAI). 324–330.
• Motwani and Raghavan (2010) Rajeev Motwani and Prabhakar Raghavan. 2010. Randomized algorithms. Chapman & Hall/CRC.
• Moulin (1991) Hervé Moulin. 1991. Axioms of cooperative decision making. Number 15. Cambridge University Press.
• Pennock et al. (2000) David M. Pennock, Eric Horvitz, and C. Lee Giles. 2000. Social Choice Theory and Recommender Systems: Analysis of the Axiomatic Foundations of Collaborative Filtering. In Proc. 17th National Conference on Artificial Intelligence and 12th Conference on on Innovative Applications of Artificial Intelligence. 729–734.
• Peters (2017) Dominik Peters. 2017. Recognising Multidimensional Euclidean Preferences. In Proc. 31st AAAI Conference on Artificial Intelligence (AAAI). 642–648.
• Ron (2001) Dana Ron. 2001. Property testing. Comb. Opt. 9, 2 (2001), 597–643.
• Satterthwaite (1975) M.A. Satterthwaite. 1975. Strategy-proofness and Arrow’s Conditions: Existence and Correspondence Theorems for Voting Procedures and Social Welfare Functions. J. Econ. Theory 10, 2 (1975), 187–217.
• Shao (2011) Xiaofeng Shao. 2011. Testing for white noise under unknown dependence and its applications to diagnostic checking for time series models. Econ. Theory 27, 2 (2011), 312–343.