# Forming Probably Stable Communities with Limited Interactions

A community needs to be partitioned into disjoint groups; each community member has an underlying preference over the groups that they would want to be a member of. We are interested in finding a stable community structure: one where no subset of members S wants to deviate from the current structure. We model this setting as a hedonic game, where players are connected by an underlying interaction network, and can only consider joining groups that are connected subgraphs of the underlying graph. We analyze the relation between network structure, and one's capability to infer statistically stable (also known as PAC stable) player partitions from data. We show that when the interaction network is a forest, one can efficiently infer PAC stable coalition structures. Furthermore, when the underlying interaction graph is not a forest, efficient PAC stabilizability is no longer achievable. Thus, our results completely characterize when one can leverage the underlying graph structure in order to compute PAC stable outcomes for hedonic games. Finally, given an unknown underlying interaction network, we show that it is NP-hard to decide whether there exists a forest consistent with data samples from the network.

## Authors

• 19 publications
• 5 publications
• 14 publications
• ### Stable divisorial gonality is in NP

Divisorial gonality and stable divisorial gonality are graph parameters,...
08/21/2018 ∙ by Hans L. Bodlaender, et al. ∙ 0

• ### Perfect Forests in Graphs and Their Extensions

Let G be a graph on n vertices. For i∈{0,1} and a connected graph G, a s...
05/01/2021 ∙ by Gregory Gutin, et al. ∙ 0

• ### An inferential procedure for community structure validation in networks

`Community structure' is a commonly observed feature of real networks. T...
10/18/2017 ∙ by Luisa Cutillo, et al. ∙ 0

• ### The Price is (Probably) Right: Learning Market Equilibria from Samples

Equilibrium computation in markets usually considers settings where play...
12/29/2020 ∙ by Omer Lev, et al. ∙ 0

• ### Group Activity Selection on Social Networks

We propose a new variant of the group activity selection problem (GASP),...
12/07/2017 ∙ by Ayumi Igarashi, et al. ∙ 0

• ### PAC learning with stable and private predictions

We study binary classification algorithms for which the prediction on an...
11/24/2019 ∙ by Yuval Dagan, et al. ∙ 10

• ### PAC: Practical Accountability for CCF

Permissioned ledger systems execute transactions on a set of replicas go...
05/27/2021 ∙ by Alex Shamis, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A professor wants her students to complete a group programming project. In order to do so, students should divide into project groups with a few students in each; naturally, some groups will be objectively better than others. However, students seldom try to find a group that’s objectively optimal for them; they would rather join groups that have at least one or two of their friends. This type of scenario falls into the realm of constrained coalition formation; in other words, how should we partition a group of people given that (a) they have preferences over the groups they are assigned to and (b) they have limited interactions with one another?  Other scenarios fitting this description include

1. Seating arrangements at a wedding (or at conference banquets): some guests should absolutely not be seated together, while others would probably enjoy one another’s company. However, it should always be the case that every guest has at least one acquaintance seated at their table.

2. Group formation on social media: given a social media network (e.g. Facebook), people prefer being affiliated with certain groups; however, they are limited to joining groups that already contain their friends.

Constrained coalition formation problems are often modeled as hedonic games. Hedonic games formally capture a simple, yet compelling, paradigm: how does one partition players into groups, while factoring individual players’ preferences? The literature on hedonic games is primarily focused on finding “good” coalition structures — partitions of players into disjoint groups. A set of coalition structures satisfying certain desiderata is called a solution concept. A central hedonic solution concept is coalitional stability: given a coalition structure , we say that a set of players (also known as a coalition) can deviate from if every prefers to its assigned group under ; a coalition structure is stable if no coalition can deviate. In other words, contains at least one player who prefers its current coalition (denoted ) to . The set of stable coalition structures — also known as the core of the hedonic game — may be empty; what’s worse, even when it is known to be non-empty, finding a stable coalition structure may be computationally intractable. Moreover, efficient algorithms for finding stable coalition structures often assume full knowledge of the underlying hedonic game; that is, in order to work, the algorithm needs to have either oracle access to player preferences (i.e. queries of the form ‘does player prefer coalition to coalition ?’), or structural knowledge of the underlying preference structure (e.g. some concise representation of player preferences that one can leverage in order to obtain a poly-time algorithm).

Neither assumption is realistic in practice: eliciting user preferences is notoriously difficult, especially over combinatorially complex domains such as subsets of players. If one forgoes preference elicitation and opts for mathematically modeling preferences (e.g. assuming that users have additive preferences over coalition members), it is not entirely obvious what mathematical model of user preferences is valid. This leads us to the following natural question: can we find a stable coalition structure when player preferences are unknown? Recent works [Balcan, Procaccia, and Zick2015, Balkanski, Syed, and Vassilvitskii2017, Sliwinski and Zick2017] propose a statistical approach to stability in collaborative environments. In this framework, one assumes the existence of user preference data over some coalitions, which is then used to construct probably approximately stable outcomes (the notion is referred to as PAC stability). In this paper, we explore the relation between structural assumptions on player preferences, and computability of PAC stable outcomes.

Our contribution We assume that there exists some underlying interaction network governing player preferences; that is, players are nodes on a graph, and only connected coalitions are feasible. Within this framework, we show that if player preferences are restricted by a forest, one can compute a PAC stable outcome using only a polynomial number of samples. Surprisingly, even if the underlying forest structure is not known to the learner, PAC stabilizability still holds, despite the fact that it may be computationally intractable to find an approximate forest structure that is likely consistent with the true interaction graph. In contrast, we show that it is impossible to find a PAC stable outcome even if the graph contains a single cycle. The latter result is constructive: we show that whenever the underlying interaction graph does contain a cycle, one can construct a sample distribution for which it would be impossible to elicit a PAC stable outcome.

Our positive result for forests is interesting in several respects. First, while one can find PAC stable outcomes in polynomial time, computing stable outcomes for hedonic games on forests is computationally intractable [Igarashi and Elkind2016]; second, unlike [Sliwinski and Zick2017], we do not require that player preferences are provided in the form of numerical utilities over coalitions. This not only makes our results more general, but also more faithful to the problem we model, which assumes ordinal information about player preferences, rather than cardinal utilities. Finally, in Section 5, we prove a non-trivial technical result on learning forest structures that is of independent interest. Briefly, we study the following problem: we are given samples of subsets of graph vertices, each labeled either ‘connected’ or ‘disconnected’; we need to decide whether there exists some forest that is consistent with the sample — i.e. all connected sets of vertices are connected under and all disconnected sets are not. We show that when all of our vertex samples are connected (i.e. we do not observe any disconnected components), it is possible to efficiently learn an underlying forest structure (if one exists); on the other hand, if one assumes that both connected and disconnected sets are presented to the learner, it is computationally intractable to decide whether there exists a forest, or even a path, that is consistent with the samples.

Related work There exists a rich body of literature studying hedonic games from an economic perspective (e.g. (Banerjee, Konishi, and Sönmez 2001; Bogomolnaia and Jackson 2002)). More recently, the AI community has be- gun studying both computational and analytical proper- ties of hedonic games (see e.g. (Aziz and Brandl 2012; Deineko and Woeginger 2013; Gairing and Savani 2010; Peters and Elkind 2015), and (Aziz and Savani 2016; Woeginger 2013) for an overview). Interaction networks in cooperative games were first introduced by Myerson (1977). The relation between graph structure and stability in the classic cooperative game setting is also relatively well-understood. Demange (2004) shows that if the underlying interaction network is a forest, then the core is not empty; further studies (Bousquet, Li, and Vetta 2015; Meir et al. 2013) establish relations between approximate stability and the underlying graph structure, while Chalkiadakis, Greco, and Markakis (2016) study the computational complexity of finding core outcomes in graph restricted environments. Igarashi and Elkind (2016) establish both the existence of stable coalition structures in hedonic games over forests, as well as the computational intractability of finding stable coalition structures; Peters (2016) studies the relation between hedonic solution concepts and the treewidth of the underlying interaction graph.

Several works study learning based game-theoretic solu- tion concepts. Sliwinski and Zick (2017) introduce PAC stability in hedonic games, and analyze several common classes of hedonic games. Other works on learning and game theory include learning in cooperative games (Balcan, Procaccia, and Zick 2015; Balkanski, Syed, and Vassilvitskii 2017), rankings (Balcan, Vitercik, and White 2016), auctions (Bal- can et al. 2012; Balcan, Sandholm, and Vitercik 2018; Morgenstern and Roughgarden 2016) and noncooperative games (Fearnley et al. 2013; Sinha, Kar, and Tambe 2016).

## 2 Preliminaries

Throughout this paper, vectors are denoted by

, and sets are denoted by uppercase letters; given a value , we set . A hedonic game is given by a pair , where is a finite set of players, and is a list of preferences players in have over subsets of (also referred to as coalitions); in more detail, for every , we write ; describes a complete and transitive preference relation over . For each , let denote the strict preference derived from , i.e., if , but . An outcome of a hedonic game is a coalition structure, i.e., a partition of into disjoint coalitions; we denote by the coalition containing . A solution concept is a mapping whose input is a hedonic game , and whose output is a (possibly empty) set of coalition structures. The core is the most fundamental solution concept in hedonic games. First, we say that a coalition strongly blocks a coalition structure if every player strictly prefers to its current coalition , i.e. . A coalition structure is said to be core stable if no coalition strongly blocks .

### 2.1 Interaction Networks

Given an undirected graph whose nodes are the player set, we restrict the space of feasible coalitions to be the set of connected subsets of ; we denote by the set of feasible coalitions. Intuitively, we restrict our attention to coalition structures where all group members form a social subnetwork of the underlying interaction graph. Note that when is a clique, all coalitions are feasible, and the result is a standard (unrestricted) hedonic game. From now on, we define a hedonic graph game as the tuple ; here, is the set of players, their preference relations, and the edges of the underlying interaction network. We focus our attention only on core stable coalition structures that consist of feasible coalitions.

In what follows, it is useful to express player preferences in terms of cardinal utilities. In other words, player assigns a value to every coalition ; we write a hedonic game as where is a collection of functions for each . This representation allows us to seamlessly integrate ideas from PAC learning into the hedonic games model, and is indeed quite common in other works studying hedonic games. However, as we later show, our main result (Theorem 4.1) still holds when we transition from a utility-based cardinal model, to a preference-based ordinal model.

### 2.2 PAC Learning

We provide a brief introduction to PAC learning111What we show here is but one of many variants on the theory of PAC learning. There are many excellent sources on this classic theory; we refer our reader to [Anthony and Bartlett1999, Kearns and Vazirani1994, Shashua2009]. The basic idea is as follows: we are given an unknown function (a target concept in the language of PAC learning) that assigns values to subsets of players. In addition, we are given a set of samples where and is the valuation of over

; we wish to estimate

on subsets we did not observe. We assume that belongs to a hypothesis class (say, we know that is an additive valuation). Our goal is to output a hypothesis (e.g. if is additive, should be as well) that is likely to match the outputs of on future observations drawn from some distribution . More formally, a hypothesis is approximately correct

w.r.t a probability distribution

over and an unknown function if

 PrS∼D[v∗(S)≠v(S)]<ϵ.

A learning algorithm takes as input samples

 (S1,v(S1)),(S2,v(S2)),…,(Sm,v(Sm))

drawn i.i.d. from a distribution over , and two parameters .

A class of functions is PAC (probably approximately correctly) learnable if there exists an algorithm that for any and probability distribution over , with probability of at least , it outputs a hypothesis that is approximately correct with respect to and . If this holds for any , is said to be PAC learnable; moreover, if the running time of , and the number of samples are polynomial in and , is said to be efficiently PAC learnable.

The value is the confidence parameter: intuitively, it is the probability that the random samples drawn from do not accurately portray the true sample distribution; for example, if

is the uniform distribution, then it is possible (though unlikely) that we draw the same subset in every one of our

samples. The value is called the error parameter: it is the likelihood that our hypothesis does not agree with the target concept . Not all hypothesis classes are efficiently PAC learnable; learnability is inherently related to the complexity of the hypothesis class. The complexity of real-valued functions is commonly measured using the notion of pseudo dimension (see e.g. Chapter of [Anthony and Bartlett1999]). Given a list of sets , and corresponding values we say that a class of functions can pseudo-shatter if for any labeling , there is some such that iff . The pseudo-dimension of , denoted is

 max{m∣∃(Sj,rj)mj=1 that can be % shattered by H}.

The following well-known theorem relates the pseudo-dimension and PAC learnability.

###### Theorem 2.1 ([Anthony and Bartlett1999]).

A class of functions is efficiently PAC learnable using samples if there exists an algorithm such that given samples drawn i.i.d. from a distribution , it outputs consistent with the sample, i.e. for all sampled , and runs in time polynomial in and . Furthermore, if is superpolynomial in , is not PAC learnable.

In other words, in order to establish the PAC learnability of some hypothesis class, it suffices that one shows that its pseudo dimension is low, and that there exists some efficient algorithm that is able to output a hypothesis which matches the outputs of on all samples. We note that even if an efficient consistent algorithm does not exist (e.g. if the problem of matching a hypothesis to the samples is computationally intractable), a low pseudo dimension is still desirable: it implies that the number of samples needed in order to find a good hypothesis is polynomial.

### 2.3 PAC Stabilizability

When studying hedonic games, one is not necessarily interested in eliciting approximately accurate user preferences over coalitions using data; in our case, we are interested in identifying core stable coalition structures. Intuitively, it seems that the following idea might work: first, infer player utilities from data and obtain a PAC approximation of the original hedonic game; next, find a coalition structure that stabilizes the approximate hedonic game. This approach, however, may be overcomplicated: first, it may be impossible to PAC learn player preferences from data (this depends on the hypothesis class); moreover, computing a core coalition structure for the learned game may be computationally intractable. [Sliwinski and Zick2017] propose learning a stable outcome directly from data. They introduce a statistical notion of core stability for hedonic games, which they term PAC stability (this term was first used by [Balcan, Procaccia, and Zick2015] for cooperative transferable utility games).

We say that a partition is -PAC stable w.r.t. a probability distribution over if

 PrS∼D[S strongly blocks π]<ϵ.

The inputs to our learning algorithms will be samples

 (S1,→v(S1)),(S2,→v(S2)),…,(Sm,→v(Sm)),

where , and is a vector describing players’ utilities over ; that is, .

Given an unknown hedonic game belonging to some hypothesis class , a PAC stabilizing algorithm takes as input sets sampled i.i.d. from a distribution , and players’ preferences over the sampled sets; in addition, it receives two parameters . The algorithm PAC stabilizes , if for any hedonic game , distribution over , and parameters , with probability , outputs an -PAC stable coalition structure if it exists; again, if the running time of the algorithm and the number of samples, , are bounded by a polynomial in , and , then we say that efficiently PAC stabilizes . Similarly, we say that is (efficiently) PAC stabilizable if there is some algorithm that (efficiently) PAC stabilizes .

## 3 Learning Hedonic Graph Games

In what follows we consider the following hypothesis class.

###### Definition 3.1.

For an undirected graph , let be the class of all hedonic games where for each player , and player strictly prefers its singleton to any disconnected coalition , i.e., for all .

We first present a baseline negative result: fixing a forest , the hypothesis class, is not efficiently PAC learnable. When referring to the PAC learnability of any class of hedonic games, we mean inferring some utility function for all that PAC approximates the true utilities of players in . This approximation guarantee can be interpreted in both an ordinal and cardinal manner. If we are given player ’s ordinal preferences, this simply means that is consistent with the ordinal preferences; if we are given player ’s cardinal utility function , should be a PAC approximation of . As Theorem 3.2 shows, even when we are given additional information about the underlying graph interaction network, players’ preferences are not PAC learnable.

###### Theorem 3.2.

For any graph with exponentially many connected coalitions, the class is not efficiently PAC learnable.

###### Proof.

Recall that is the set of all feasible coalitions over ; by assumption, is exponential. Let be the set of all possible utility functions satisfying and for all disconnected coalition . The utility player derives from feasible coalitions in is unrestricted; in particular, one cannot deduce anything about the utility of some feasible coalition , based on other feasible coalitions’ utilities. This immediately implies that the set can be pseudo-shattered by . Hence is at least exponential, and by Theorem 2.1, is not efficiently PAC learnable. ∎

As an immediate corollary, forest interaction structures do not admit PAC learnable preference structures in general; this is true even if is a star graph over players, since the number of feasible coalitions is exponential in .

###### Corollary 3.3.

Let be a star graph over players; then is not PAC learnable.

###### Proof.

For a star with nodes, any coalition containing the center of the star is feasible, hence it has feasible coalitions. By Theorem 3.2, hedonic games on forests are not PAC learnable. ∎

The reason that hedonic games with forest interaction structures are not PAC learnable is that they may have exponentially many feasible coalitions; this is also the reason that finding a core stable coalition structure for hedonic games with forest interaction structures is computationally intractable [Igarashi and Elkind2016]. However, we now show how one can still exploit the structural properties of forest graph structures to efficiently compute PAC stable outcomes.

## 4 PAC Stabilizability of Hedonic Graph Games

Having established that hedonic games with a forest interaction structure are not, generally speaking, PAC learnable, we turn our attention to their PAC stabilizability. We divide our analysis into two parts. We begin by assuming that the underlying interaction graph structure is known to us; in other words, we know that our game belongs to the hypothesis class . In Section 5, we show how one can forgo this assumption.

###### Theorem 4.1.

If is a forest, is efficiently PAC stabilizable.

###### Proof.

We claim that Algorithm 1 PAC stabilizes . It is related to the algorithm introduced in demange2004stability (demange2004stability) used to find core stable outcomes for forest-restricted hedonic games222demange2004stability (demange2004stability) presents the algorithm for non-transferable cooperative utility games on trees where each coalition has a choice of action. A hedonic game is a special case of a non-transferable utility game where each coalition has a unique action. in the full information setting. Intuitively, instead of identifying the guaranteed coalition for each player precisely, Algorithm 1 approximates it. If the input graph is a forest, we can process each of its connected components separately, so we can assume that is a tree.

We first provide an informal description of our algorithm, followed by pseudocode. The algorithm first transforms into a rooted tree with root by orienting the edges in towards the leaves. For every player starting from the bottom to the top, the algorithm identifies - a coalition containing , the best for observed in the samples that is entirely contained in ’s subtree, such that others in prefer it to their own best guaranteed coalition; in other words, for all . Having identified for every , players are partitioned according to the ’s from top-down. The main concern is to ensure that is a good approximation of its full-information counterpart; this is guaranteed by taking a sufficiently large sample size .

In what follows, we assume an orientation of the trees in , with arbitrary root nodes. Fixing the orientation, we let be the set of descendants of (we assume that ). For each coalition , we denote by the set of children of , namely,

 \rm child(S)={i∈N∖S∣ i's parent belongs to S}.

The height of a node is defined inductively as follows: if is a leaf, i.e., , and

 0pt(i):=1+max{0pt(j)∣j∈\rm desc(i)∖{i}},

otherwise.

Given player , let be the collection of coalitions for every descendant of , i.e., . For each and each coalition , we let mean that , is connected, and every other player in weakly prefers to . Now, we define a modified preference order for player , , that devalues any coalition for which does not hold.

• If and , then

• If but , then

• If , then

Given and a distribution , we say that a coalition is top- for player , if

 PrS∼D[S≻BiX]≤ϵn.

Trivially, for every the probability of sampling a top- coalition for player from is at least ; moreover, if , then any coalition is top-.

Intuitively, approximates the best coalition can form with members of the subtree rooted at . Algorithm 1’s objective is to ensure that sampling a coalition from such that is unlikely, namely, the probability of seeing from such that is better for the highest node in than , and every other player in prefers it to their , is smaller than ; this is done by examining enough coalitions so as to see some top- coalition for every player.

Examine what happens if containing ’s for ’s descendants is fixed upfront, i.e. not dependent on the sample. Let us bound the probability that for , none of the coalitions in are top-:

 (1−ϵn)m=(1−ϵn)⌈nϵlognδ⌉ (1) ≤((1−ϵn)nϵ)lognδ<(1e)lognδ<δn

Note that Inequality (1) is true irrespective of what is. Taking a union bound, the probability that there is some player such that there is no top- coalition for in is at most . Note that can end up not containing any coalition (line 7). But then with high confidence, as a special case of the above consideration, every coalition is top-, and the algorithm can pick .

Recall that in an actual run of the algorithm the sample is drawn, and for every descendant of , is computed based on , and then is computed based on the same sample. One can ask whether some dependence between the computation of and the ’s does not invalidate Inequality (1). This potential problem can be easily solved by taking a larger number of samples: if we take samples, we can just use samples to compute each and maintain complete independence in the samples.

In order to see the smaller sample size used in Algorithm 1 provides the same guarantee, consider an equivalent reordering of the computation of and ’s: first, for every , determine the number of connected coalitions in the sample such that will be the highest node in . Then, draw the other coalitions and compute ’s for every descendant of ; finally, based on this, determine the family . Note that regardless of what is, each of the undetermined, independently drawn coalitions has probability of at least to be top- for . Hence, the inequality (1) holds even if and are computed based on the same sample of coalitions .

We are now ready to prove that the coalition structure outputted by Algorithm 1 returns a PAC stable outcome . We observe that any coalition included in the returned is a for some . Note that for every , we have that (line 7). Now, consider any coalition that strongly blocks ; let . Since strongly blocks ,

 vj(X)>vj(π(r)(j))≥vj(Bj)

for all players . In particular, . By construction of and Inequality (1), is top- for ; that is,

 ϵn>PrS∼D[vi(S)>vi(Bi)]≥PrS∼D[S=X];

thus the probability of drawing a coalition such as from , i.e. strongly blocking and having , is less than . Taking a union bound over all players,

 PrX∼D[X strongly blocks π(r)]<ϵ;

this guarantee holds with confidence . ∎

We conjecture that a similar argument can imply a stronger statement. That is, we can replace ‘strongly block’ in the definition of PAC stabilizability with ‘weakly block’ and still obtain PAC stabilizability on trees. (A coalition weakly blocks a coalition structure if every player weakly prefers to their current coalition and at least one player in has a strict preference) We note that in the full information setting, a strict core outcome does not necessarily exist on trees [Igarashi and Elkind2016].

###### Remark 4.2 (From Cardinal to Ordinal Preferences).

Note that step 8 of the Algorithm 1 is the only step that refers to the numerical representation of agent preferences . The algorithm chooses a coalition with maximal utility value out of some set of possible coalitions; in particular, the only thing required for the successful implementation of Algorithm 1 is players’ ranking of coalitions in the sample. In other words, the particular numerical representation of player preferences plays no role. This is a significant departure from the algorithms devised by [Sliwinski and Zick2017], where the type of utility representation functions used was crucial for PAC stability.

Next, we show that Theorem 4.1 is ‘tight’ in the sense that if the graph contains a cycle, is not PAC stabilizable.

###### Theorem 4.3.

Given a non-forest graph , the class is not PAC stabilizable.

###### Proof.

Since is not a forest, there is a cycle in . Without of loss of generality, let be a cycle with for all , and . Let , , . Suppose is the uniform distribution on and that the following holds:

 S1≻1S3,S2≻2S1,S3≻3S2. (2)

In this case, nothing beyond (2) can be deduced about the game by examining samples from . Consider the following games satisfying (2):

• A game where every player strictly prefers to any other coalition, and any non-singleton coalition is less preferred than and , namely, for any . Here we set . Every player strictly prefers to any other coalition.

• A game where every player in strictly prefers to any other coalition, and every player strictly prefers to any coalition other than . Every player strictly prefers to any other coalition other than .

Suppose towards a contradiction that there is an algorithm that returns a -PAC stable partition . We will show that for to be resistant against deviations supported by , has to include or or for the first game, and for the second game, which implies that it is impossible to achieve with any confidence .

• Consider the first game . Suppose for a contradiction that no player forms a singleton. We will show that at least one of , , and would strongly block with probability . The claim is clear when no player belongs to ; thus suppose at least one of is formed. Then we have the following three cases.

• If , , and , then players in strictly prefer to their own coalitions.

• If , , and , then players in strictly prefer to their own coalitions.

• If for all , and , then players in strictly prefer to their own coalitions.

In either case, is strongly blocked with probability at least , a contradiction.

• Consider the second game . Suppose for a contradiction that the coalition is not formed, i.e., . Again, at least one of is formed as otherwise would not be resistant against deviations supported by . Now we have the following three cases.

• If , , then players in strictly prefer to their own coalitions.

• If , , then players in strictly prefer to their own coalitions.

• If for all , then players in strictly prefer to their own coalitions.

In either case, is strongly blocked with probability at least , a contradiction.

## 5 Inferring Tree Interaction Networks from Data

Until now, we assume that the underlying interaction network was given to us as input; this is, naturally, an assumption that we would like to forgo. Suppose the underlying graph is a forest , and consider the question of whether it is possible to infer a forest that agrees with the original graph with high probability. Let be the set of all possible trees over vertices, and let be the set of all possible forests; is our hypothesis class for guessing an approximate forest. More formally, consists of functions that given an vertex forest , output if a set of vertices is connected, and 0 otherwise. By Cayley’s formula:

 |Tn|=nn−2 (3)

Any forest can be obtained by choosing a tree, and then choosing a subset of its edges, hence:

 |Fn|≤|Tn|2n−1=nn−22n−1 (4)

We observe the following variant of Theorem 2.1 for finite hypothesis classes.

###### Theorem 5.1 (anthony1999learning (anthony1999learning)).

Let be a finite hypothesis class where is polynomial in . If there exists a polynomial time algorithm that for any , and samples

 ⟨S1,v(S1)⟩…,⟨Sm,v(Sm)⟩

finds a function consistent with the samples, i.e., for each , then is efficiently PAC learnable.

Since , all we need is to establish the existence of an efficient algorithm to compute a forest consistent with a given sample. More formally, let be an unknown forest; we are given a set of subsets of vertices labeled ’connected’ or ’disconnected’ according to , can we find a forest that is consistent with the labeling? First, we consider an easier question and assume all subsets are connected. The answer to this question is affirmative, and appears in conitzer2004graphs (conitzer2004graphs).

###### Theorem 5.2 (conitzer2004graphs (conitzer2004graphs)).

Let be a tree. Given a list of connected vertices in , there exists a poly-time algorithm that outputs a tree where every subset is connected in .

Theorem 5.2 pertains to trees, but immediately generalizes to forests by noting that if is a forest, any tree whose edgeset is a superset of is a valid solution as well, hence the same algorithm solves the problem.

In other words, if one only observes subsets of feasible coalitions and players’ preferences over them, it is possible to find a forest structure consistent with the samples.

###### Corollary 5.3.

If the probability distribution supports only connected subgraphs, is efficiently PAC learnable over .

Corollary 5.3 is immediately implied by (4), Theorems 5.1 and 5.2. Theorem 4.1 assumes that the underlying interaction graph is known to us. Leveraging Corollary 5.3, we now show that this assumption can be forgone; that is, it is possible to PAC stabilize a hedonic game whose underlying interaction graph is a forest, even if the forest structure is unknown to us. Note that we established that the forest structure can be PAC learned efficiently only if the sample contains exclusively connected coalitions, yet we do not have this requirement for PAC stabilizability.

###### Theorem 5.4.

Let be the class of all hedonic games whose interaction graph is a forest; then is efficiently PAC stabilizable.

###### Proof.

Suppose are given, and there is an unknown forest , hedonic game and a probability distribution over coalitions. Let be a distribution obtained from by substituting any disconnected coalitions with . supports only connected coalitions, so by Corollary 5.3, can be efficiently PAC learned with respect to to obtain s.t. with confidence . Let be a distribution obtained from by substituting any coalitions s.t. with . Since for any supported by , by Theorem 4.1, given , can be PAC stabilized with respect to to obtain a partitioning of the agents such that with confidence . For ease of notation, we write whenever strongly blocks .

 PrS∼D[dev(S,π)]= PrS∼D[dev(S,π)∧S is connected % in G] = PrS∼D′[dev(S,π)] = PrS∼D′[dev(S,π)∧fG′(S)=fG(S)] +PrS∼D′[dev(S,π)∧fG′(S)≠fG(S)] ≤ PrS∼D′[dev(S,π)∧fG′(S)=fG(S)]+ε2 (5) = PrS∼D′′[dev(S,π)]+ε2≤ε2+ε2=ε (6)

By construction of and , lines (5) and (6) hold with confidence each. We conclude that with confidence at least . Since the constructions of and both require a polynomial number of samples from , is efficiently PAC stabilizable. ∎

Theorem 5.2, while interesting in its own right, provides us with only a partial understanding of the problem: if all one is given is positive examples, it is possible to find a tree structure that is consistent with all connected coalitions. In what follows, we study a more general question of whether we can find a forest consistent with both positive (connected coalitions) and negative (disconnected coalitions) examples. As we show in Theorem 5.5, introducing the possibility of negative examples makes the problem computationally intractable, even if we restrict ourselves to the hypothesis class of paths. Hence, forests cannot be PAC learned efficiently. It is interesting to note that Theorem 5.4 could be achieved despite this negative result.

### 5.1 The Complexity of Constructing Consistent Trees

We now argue that deciding whether there exists a forest consistent with both positive and negative examples is computationally intractable; in fact, this claim holds even when the desired forest is a path. This result stands in sharp contrast to known computational results in the literature; indeed, there are several efficient algorithms for such restricted networks when only connected coalitions are taken into account 333The problem of deciding the existence of a path consistent with connected coalitions is equivalent to the problem of determining whether the intersection graph of a hypergraph is an interval, which is also closely related to testing the consecutive ones property of a matrix (see, e.g. the survey by dom2009consecutive (dom2009consecutive) for more details). [Booth and Lueker1976, Korte and Möhring1987, Corneil, Olariu, and Stewart1998, Fulkerson and Gross1965, Habib et al.2000, Kratsch et al.2006, Hsu and Ma1999].

Specifically, we are given samples of node subsets ; each subset is labeled by a function such that

 ℓG(Sj)={1if Sj is connected% in G0otherwise. (7)

We say that a graph is consistent with over the samples if and only if for all . Our objective is to find a forest such that for all . Theorem 5.5 states that it is NP-hard to determine whether such a graph exists.

###### Theorem 5.5.

Given a family of subsets such that each set in is of size at most , and a mapping , it is NP-hard to decide whether there exists a path such that for each . The result also holds when is a forest.

###### Proof.

We will first show that it is NP-hard to decide whether there exists a path consistent with both positive and negative samples; later we will show that how the reduction can be extended to forests.

Our reduction is from a restricted version of 3SAT. Specifically, we consider (3,B2)-SAT. Recall that in this version of 3SAT, each clause contains at most literals, and each variable occurs exactly twice positively and twice negatively; this problem is known to be NP-complete [Berman, Karpiński, and Scott2004].

Idea: consider a formula with a variable set and clause set , where for each variable we write and for the two positive occurrences of , and and for the two negative occurrences of . We will have one clause gadget for each clause and one variable gadget for each variable . Most player arrangements will be inconsistent with the pair unless the following holds:

• For each clause , a literal player contained in a clause connects the players from a clause gadget.

• For each variable , either the pair of positive literal players or the pair of negative literal players connects the players from a variable gadget.

Hence, one can think of the variable gadgets as forcing a path to make a choice between setting true and setting false; each clause gadget ensures that the resulting assignment is satisfiable.

Construction details: For each variable , we introduce two variable players and , and four literal players

 xi(1),xi(2),¯xi(1),¯xi(2),

which correspond to the four occurrences of . For each clause , we introduce two clause players and . Let . We introduce garbage collectors , and two leaf players and . Intuitively, garbage collectors will be used to connect the literal players that do not appear in any clause or variable gadget.

Our set of samples consists of three subfamilies , , and : the sets in correspond to the connectivity constraints, the sets in correspond to disconnected coalitions of size , and the sets in correspond to disconnected coalitions of size .

First, we construct the set that constitutes of

• the four pairs , , , ;

• the consecutive pairs for ; and

• the consecutive pairs for .

We next construct the negative samples and as follows. The family is the set of all player pairs, except for the following:

• the pairs in .

• the pairs of a variable player and its corresponding literal player, i.e., the pairs of the form or .

• the pairs of a clause player and a literal player contained in it, i.e., the pairs of the form where is a literal player in a clause .

• the pairs of positive literal players or negative literal players of each variable, i.e., the pairs of the form or .

• the pairs of a literal player and a garbage collector, i.e., the pairs of the form or .

In a path consistent with the samples, each variable player can share an edge with its literal player; and each clause player can share an edge with a literal player contained in it.

The family consists of the following player triples:

• triples of the form where and , and the triples of the form where and ; and

• the triples of the form where and , and the triples of the form where and .

Here and . The above constraints mean that if a variable player and its positive literal player (respectively, its negative literal player ) are adjacent, then the player can be only adjacent to the other positive literal player (respectively, the other negative literal player ), which can then be only adjacent to the other variable player .

Finally, for each we set if and only if . Note that the number of players in the instance is bounded by and the number of sets in is bounded by .

Correctness: We will now show that is satisfiable if and only if there exists a path consistent with .

: Suppose that there exists a truth assignment that satisfies . First, since is a satisfiable assignment for , for each clause gadget , we can select exactly one literal that satisfies a clause ; we connect the literal player with each of the clause players and by an edge. We combine all the clause gadgets by constructing an edge for each . Now, we consider an assignment that gives the opposite values to , and connect each variable gadget using the literals corresponding to this assignment. Specifically, for each variable gadget , if is set to false, we select its positive literal players and construct a path that consists of three edges , , and ; similarly, if that is set to true, we select its negative literal players and construct a path that consists of three edges , , and . We then create an edge for each , and merge the variable gadgets all together.

Finally, we construct a path over the rest of players, by aligning the garbage collectors in increasing order of their index, and putting one of the remaining literal players into each consecutive pair of garbage collectors arbitrarily. We then merge all the paths by creating the four edges , , , and ; see Figure 1 for an illustration. It is easy to verify that the resulting graph is a path consistent with the samples.

: Conversely, suppose that there is a path consistent with , i.e., for each , is connected in if and only if . Since every pair in should be connected, the four pairs , ,, must form an edge in . Similarly, we have for each ; also, for each . Observe that both players and must be the leaves of the constructed path since these players are only allowed to have one neighbor; thus, every other player has degree . Combining these observations, the definition of ensures that our path specifies a truth assignment for .

For each <