Collective Schedules: Scheduling Meets Computational Social Choice

03/20/2018 ∙ by Fanny Pascual, et al. ∙ Laboratoire d'Informatique de Paris 6 University of Warsaw 0

When scheduling public works or events in a shared facility one needs to accommodate preferences of a population. We formalize this problem by introducing the notion of a collective schedule. We show how to extend fundamental tools from social choice theory---positional scoring rules, the Kemeny rule and the Condorcet principle---to collective scheduling. We study the computational complexity of finding collective schedules. We also experimentally demonstrate that optimal collective schedules can be found for instances with realistic sizes.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Major public infrastructure projects, such as extending the city subway system, are often phased. As workforce, machines and yearly budgets are limited, phases have to be developed one by one. Some phases are inherently longer-lasting than others. Moreover, individual citizens have different preferred orders of phases. Should the construction start with a long phase with a strong support, or rather a less popular phase, that, however, will be finished faster? If the long phase starts first, the citizens supporting the short phase would have to wait significantly longer. Consider another example: planning events in a single lecture theater for a large, varied audience. The theater needs to be shared among different groups. Some events last just a few hours, while others multiple days. What is the optimal schedule? We formalize these and similar questions by introducing the notion of a collective schedule, a plan that takes into account both jobs’ durations and their societal support. The central idea stems from the observation that the problem of finding a socially optimal collective schedule is closely related to the problem of aggregating agents’ preferences, one of the central problems studied in social choice theory [3]. However, differences in jobs’ lengths have to be explicitly considered. Let us illustrate these similarities through the following example.

Consider a collection of jobs all having the same duration. The jobs have to be processed sequentially (one by one). Different agents might have different preferred schedules of processing these jobs. Since each agent would like all the jobs to be executed as soon as possible, the preferred schedule of each agent does not contain “gaps” (idle times), and so, such a preferred schedule can be viewed as an order over the set of jobs, and can be interpreted as a preference relation. Similarly, the resulting collective schedule can be viewed as an aggregated preference relation. From this perspective, it is natural to apply tools from social choice theory to find a socially desired collective schedule.

Yet, the tools of social choice cannot be always applied directly. The scheduling model is typically much richer, and contains additional elements. In particular, when jobs’ durations vastly differ, these differences must be taken into account when constructing a collective schedule. For instance, imagine that we are dealing with two jobs—one very short, , and one very long, . Further, imagine that 55% of the population prefers the long job to be executed first and that the remaining 45% has exactly opposite preferences. If we disregard the jobs’ durations, then perhaps every decision maker would schedule before . However, starting with affects 55% of population just slightly (as is just slightly delayed compared to their preferred schedules). In contrast, starting with affects 45% of population significantly (as is severely delayed).

1.1 Overview of Our Contributions

We explore the following question: How can we meaningfully apply the classic tools from social choice theory to find a collective schedule? The key idea behind this work is to use fundamental concepts from both fields to highlight the new perspectives.

Scheduling offers an impressive collection of models, tools and algorithms which can be applied to a broad class of problems. It is impossible to cover all of them in a single work. We use perhaps the most fundamental (although still non-trivial) scheduling model: a single processor executing a set of independent jobs. This model is already rich enough to describe significant real-world problems (such as the public works or the lecture theater introduced earlier). At the same time, such a model, fundamental, well-studied and stripped from orthogonal issues, enables us to highlight the new elements brought by social choice.

Similarly, we focus on three well-known and extensively studied tools from social choice theory: positional scoring rules, the Kemeny rule and the Condorcet principle. Under a positional scoring rule the score that an object receives from an agent is derived only on the basis of the position of this object in the agent’s preference ranking; the objects are then ranked in the descending order of their total scores received from all the agents. The Kemeny rule uses the concept of distances between rankings. It selects a ranking which minimizes the sum of the swap distances to the preference rankings of all the agents. The Condorcet principle states that if there exists an object that is preferred to any other object by the majority of agents, then this object should be put on the top of the aggregated ranking. The Condorcet principle can be generalized to the remaining ranking positions. Assume that the graph of the preferences of the majority of agents is acyclic, i.e., there exists no such a sequence of objects that is preferred by the majority of agents to , to , , to and to . Whenever an object is preferred by the majority of agents to another object , should be put before in the aggregated ranking.

Naturally, these three notions can be directly applied to find a collective schedule. Yet, as we argued in our example with a long and a short job, this can lead to intuitively suboptimal schedules, because they do not consider significantly different processing times. We propose extensions of these tools to take into account lengths of the jobs. We also analyze their computational complexity.

1.2 Related Work

Scheduling:

The two most related scheduling models apply concepts from game theory and multiagent optimization. The selfish job model 

[18, 27] assumes that each job has a single owner trying to minimize its completion time and that the jobs compete for processors. The multi-organizational model [11] assumes that a single organization owns and cares about multiple jobs. Our work complements these with a third perspective: not only each job has multiple “owners”, but also they care about all jobs (albeit to a different degree).

In multiagent scheduling [2], agents have different optimization goals (e.g., different functions or weights). The system’s objective is to find all Pareto-optimal schedules, or a single Pareto-optimal schedule (optimizing one agent’s goal with constraints on admissible values for other goals). In contrast, our aim is to propose rules allowing to construct a single, compromise schedule. This compromise stems from social choice methods and tools. Moreover, our setting is motivated by problems in which the number of agents is large. To the best of our knowledge, the existing literature on multiagent scheduling focuses on cases with a few (e.g. two) agents.

Computational social choice: For an overview of tools and methods for aggregating agents’ preferences see the book of Arrow et al. [3]. Fischer et al. [15] overview the computational complexity of finding Kemeny rankings. Caragiannis et al. [7] discuss computational complexity of finding winners according to a number of Condorcet-consistent methods.

Typically in social choice, an aggregated ranking is created to establish the collective preference relation, and to eventually select a single best alternative (sometimes with a few runner-ups). Thus, the agents usually do not care what is the order of the candidates in the further part of the collective ranking. In our model the agents are interested in the whole output rankings. We can thus implement fairness—the agents who are dissatisfied with an order in the beginning of a collective schedule might be compensated in the further part of the schedule. Thus, our approach is closer to the recent works of Skowron et al. [26] and Celis et al. [8] analyzing fairness of collective rankings.

In participatory budgeting [6, 16, 24, 13, 4] agents express preferences over projects which have different costs. The goal is to choose a socially-optimal set of items with a total cost not exceeding the budget. Thus, in a way, participatory budgeting extends the knapsack problem similarly to how we extend scheduling.

2 The Collective Scheduling Model

We use standard scheduling notations and definitions from the book of Brucker [5], unless otherwise stated. For each integer , by we denote the set . Let be the set of agents (voters) and let be the set of jobs (note that in scheduling is typically used to denote the number of machines; we deliberately abuse this notation as our results are for a single machine). For a job by we denote its processing time (also called duration or size), i.e., the number of time units requires to be completed. We consider an off-line problem, i.e., jobs are known in advance. Jobs are ready to be processed (there are no release dates). For each job its processing time is known in advance (clairvoyance, a standard assumption in the scheduling theory). Once started, a job cannot be interrupted until it completes (we do not allow for preemption of the jobs).

There is a single machine that executes all the jobs. A schedule is a function that assigns to each job its start time , such that no two jobs execute simultaneously. Thus, either or . By we denote the completion time of job : . We assume that a schedule has no gaps: for each job , except the job that completes as the last one, there exists job such that . Let denote the set of all possible schedules for the set of jobs .

Each agent wants all jobs to be completed as soon as possible, yet agents differ in their views on the relative importance of the jobs. We assume that each agent has a certain preferred schedule , and when building , an agent is aware of the processing times of the jobs. In particular, does not have to directly correspond to the relative importance of jobs. For instance, if in a short job precedes a long job , then this does not necessarily mean that considers more important than . might consider more important, but she might prefer a marginally less important job to be completed sooner as it would delay only a bit.

A schedule can be encoded as a (transitive, asymmetric) binary relation: . E.g., means that agent  wants to be processed first, second, and so on. We will denote such a schedule as .

We call a vector of preferred schedules, one for each agent, a

preference profile. By we denote the set of all preference profiles of the agents. A scheduling rule is a function which takes a preference profile as an input and returns a collective schedule.

In the remaining part of this section we propose different methods in which the preference profile is used to evaluate a proposed collective schedule (and thus, to construct a scheduling rule ). All the proposed methods extrapolate information from (a preferred schedule) to evaluate . Such an extrapolation is common in social choice: in participatory budgeting it is typical to ask each agent to provide a single set of items [6, 16, 24, 4] (instead of preferences over sets of items); similarly in multiwinner elections, each agent provides separable preferences of candidates [25, 14]. Alternatively, we could ask an agent to express her preferences over all possible schedules. This approach is also common in other areas of social choice (e.g., in voting in combinatorial domains model [19]), yet it requires eliciting exponential information from the agents. There exist also middle ground approaches, using specifically designed languages, such as CP-nets, for expressing preferences.

2.1 Scheduling by Positional Scoring Rules

In the classic social choice, positional scoring rules are perhaps the most straightforward, and the most commonly used in practice, tools to aggregate agents’ preferences. Informally, under a positional scoring rule each agent assigns a score to each candidate (a job, in our case), which depends only on the position of in ’s preference ranking. For each candidate the scores that she receives from all the agents are summed up, and the candidates are ranked in the descending order of their total scores.

There is a natural way to adapt this concept. For an increasing function and a job  we define the of  as the total duration of jobs scheduled after in all preferred schedules:

The -psf-rule (psf for positional scoring function) schedules the jobs by their descending -scores. If jobs are unit-size (), then is simply the score that would get from the classic positional scoring rule induced by . For an identity function , the -psf-rule corresponds to the Borda voting method adapted to collective scheduling.

The so-defined scheduling methods differ from traditional positional scoring rules, by taking into account the processing times of the jobs:

  1. A score that a job receives from an agent depends on the total processing time rather than on the number of jobs that precedes in schedule .

  2. When scoring a job  we sum the duration of jobs scheduled after , rather than before it. This implicitly favors jobs with lower processing times. Indeed, consider two preferred schedules, and identical until time , at which a long job is scheduled in , and a short job is scheduled in . Since is shorter, the total size of the jobs succeeding in is larger than the total size of the jobs succeeding in . Consequently, gets a higher score from than gets from .

However, this implicit preference for short jobs seems insufficient, as illustrated by the following example.

Example 1.

Consider three jobs, , with the processing times , , and , respectively. Assume that , and consider the following preferred schedules of agents:

By -psf-rule, and are scheduled before . However, starting with would delay and by only one time unit, while starting with and delays by , an arbitrarily large value. Moreover, is put first by roughly of agents, a significant fraction.

Example 1 demonstrates that the pure social choice theory does not offer tools appropriate for collective scheduling (we will provide more arguments to support this statement throughout the text). To address such issues we propose an approach that builds upon social choice and the scheduling theory.

2.2 Scheduling Based on Cost Functions

A cost function quantifies how a given schedule differs from an agent’s preferred schedule . In this section, we adapt to our model classic costs used in scheduling and in social choice. We then show how to aggregate these costs among agents in order to produce a single measure of a quality of a schedule. This approach allows us to construct a family of scheduling methods that, in some sense, extend the classic Kemeny rule.

Formally, a cost function maps a pair of schedules, and , to a non-negative real value. We analyze the following cost functions. Below, denotes a collective schedule the quality of which we want to assess; while denotes the preferred schedule of a single agent.

2.2.1 Swap Costs.

These functions take into account only the orders of jobs in the two schedules (ignoring the processing times), thus directly correspond to costs from social choice.

  1. The Kendall [17] tau (or swap) distance (K), measures the number of swaps of adjacent jobs to turn one schedule into another one. We use an equivalent definition that counts all pairs of jobs executed in a non-preferred order:

  2. Spearman distance (S). Let denote the position of job in a schedule , i.e., the number of jobs scheduled before in . The Spearman distance is defined as:

2.2.2 Delay Costs.

These functions use the completion times of jobs in the preferred schedule (and thus, indirectly, jobs’ lengths). The completion times form jobs’ due dates, . A delay cost then quantifies how far are the proposed completion times from their due dates by one of the six classic criteria defined in Brucker [5]:

Tardiness (T)

.

Unit penalties (U)

how many jobs are late:

Lateness (L)

is similar to tardiness, but includes a bonus for being early: .

Earliness (E)

.

Absolute deviation (D)

.

Squared deviation (SD)

.

Each such a criterion naturally induces the corresponding delay cost of an agent, :

In this work, we mostly focus on the tardiness , which is both easy to interpret for our motivating examples and the most extensively studied in scheduling. However, there is interest to study the remaining functions as well. and are similar to —the sooner a task is completed, the better. The remaining three measures (, and ) penalize the jobs which are executed before their “preferred times”. However, each job when executed earlier makes other jobs executed later (e.g., after their due times). Thus, these penalties quantify the unnecessary (wasted) promotion of jobs executed too early (causing other jobs being executed too late).111The considered metrics have their natural interpretations also in other more specific settings. E.g., the earliness is useful if each task represents a (collective) work to be done by the agents (workers) and when agents do not want to work before their preferred start times. Similarly, and can be used when an agent wants each task to be executed exactly at the preferred time.

By restricting the instances to unit-size jobs, we can relate delay and swap costs. The Spearman distance has the same value as the absolute deviation (by definition), and twice that of :

Proposition 1.

For unit-size jobs it holds that , for all schedules .

Proof.

Observe that for unit-size jobs the tardiness measure can be expressed as:

Since we get that:

Thus:

And, consequently:

This completes the proof. ∎

Since different agents can have different preferred schedules, in order to score a proposed schedule we need to aggregate the costs across all agents. We will consider three classic aggregations:

The sum ():

, a utilitarian aggregation.

The max:

, an egalitarian aggregation.

The norm ():

, with a parameter . The norms form a spectrum of aggregations between the sum () and the max ().

For a cost function and an aggregation , by - we denote a scheduling rule returning a schedule that minimizes the -aggregation of the -costs of the agents. In particular, for unit-size jobs the - rule is equivalent to - and to -, and - is simply the Kemeny rule.

Scheduling based on cost functions avoids the problems exposed by Example 1 (indeed for that instance, e.g., the - rule starts with the short job ). Additionally, these methods satisfy some naturally-appealing axiomatic properties, such as reinforcement, which is a particularly natural requirement in our case.

Definition 1 (Reinforcement).

A scheduling rule satisfies reinforcement iff for any two groups of agents and , a schedule is selected by both for and for , then it should be also selected for the joint instance .

Proposition 2.

All - scheduling rules satisfy reinforcement.

2.3 Beyond Positional Scoring Rules and Cost Functions: the Condorcet Principle

In the previous section we introduced several scheduling rules, all based on the notion of a distance between schedules. Thus, these scheduling rules are closely related to the Kemeny voting system. We now take a different approach. We start from desired properties of a collective schedule and design scheduling rules satisfying them.

Pareto efficiency is one of the most accepted axioms in social choice theory. Below we use a formulation analogous to the one used in voting theory (based on swaps in preferred schedules).

Definition 2 (Pareto efficiency).

A scheduling rule satisfies Pareto efficiency iff for each pair of jobs, and , and for each preference profile such that for each we have , it holds that .

In other words, if all agents prefer to be scheduled before , then in the collective schedule should be before . Curiously, the total tardiness - rule does not satisfy Pareto efficiency:

Example 2.

Consider an instance with 3 jobs with lengths 20, 5, and 1, respectively, and with two agents having preferred schedules and . Both agents prefer to be scheduled before . If our scheduling rule satisfied Pareto efficiency, then it would pick one of the following three schedules: , , or . The total tardinesses of these schedules are equal to: 21, 25, and 10, respectively. Yet, the total tardiness of the schedule is equal to 7.

This example can be generalized to inapproximability:

Proposition 3.

For any , there is no scheduling rule that satisfies Pareto efficiency and is -approximate for - or -.

Proof.

Let us assume, towards a contradiction, that there exists a scheduling rule that satisfies Pareto efficiency and is -approximate for minimizing - (the proof for - is analogous). Let . Consider an instance with jobs: one job of length , one job of length , and jobs of length 1. Let us consider two agents with preferred schedules and . For each , both agents prefer job to be scheduled before job . Let be the schedule returned by . Since satisfies Pareto efficiency, for each , is scheduled before job in . Thus is either , or a schedule where is scheduled first, followed by jobs of length 1 (), followed by , followed by the remaining jobs of length 1. Let be such a schedule. In , the tardiness of job is (this job is in first position in ), and the tardiness of the jobs of length 1 is (the last jobs in are scheduled before in ). Thus the total tardiness of is . The total tardiness of schedule is (each of the jobs in finishes time units later than in ). Thus, the total tardiness of is at least . Let us now consider schedule , which does not satisfy Pareto efficiency, and which is as follows: job is scheduled first, followed by the jobs of length 1, followed by job . The total tardiness of this schedule is (the only job which is delayed compared to and is job ). This schedule is optimal for -. Thus the approximation ratio of is at least . Therefore, is not -approximate for -, a contradiction. ∎

Proposition 4.

If all jobs are unit-size, the scheduling rule - is Pareto efficient.

Proof.

Let us assume that there exist two jobs which are not in a Pareto order in the schedule optimizing . We can swap these jobs in and it is apparent that such a swap does not increase the total tardiness of the schedule. We can perform such swaps until we reach a schedule which does not violate Pareto efficiency. ∎

Pareto efficiency is one of the most fundamental properties in social choice. However, sometimes (especially in our setting) there exist reasons for violating it. For instance, even if all the agents agree that should be scheduled before , the preferences of the agents with respect to other jobs might differ. Breaking Pareto efficiency can help to achieve a compromise with respect to these other jobs.

Nevertheless, Proposition 3 motivated us to formulate alternative scheduling rules based on axiomatic properties. We choose the Condorcet principle, a classic social choice property that is stronger than Pareto efficiency. We adapt it to consider the durations of jobs.

Definition 3 (Processing Time Aware (PTA) Condorcet principle).

A schedule is PTA Condorcet consistent with a preference profile if for each two jobs, and , it holds that whenever at least agents put before in their preferred schedule. A scheduling rule satisfies the PTA Condorcet principle if for each preference profile it returns a PTA Condorcet consistent schedule, whenever such exists.

Let us explain our motivation for ratio . Consider a schedule and two jobs, and , scheduled consecutively in . By we denote the set of agents who rank before in their preferred schedules, and let us assume that ; we set . Observe that if we swapped and in , then each agent from would be disappointed. Since such a swap makes scheduled time units later than in , the level of dissatisfaction of each agent from could be quantified by . Thus, their total (utilitarian) dissatisfaction could be quantified by . By an analogous argument, if we started with a schedule where is put right before , and swapped these jobs, then the total dissatisfaction of agents from could be quantified by:

Thus, the total dissatisfaction of all agents from scheduling before is smaller than that from scheduling before . Definition 3 requires that in such case should be indeed scheduled before .

Proposition 5 below highlights the difference between scheduling based on the tardiness and on the PTA Condorcet principle.

Proposition 5.

Even if all jobs are unit-size, the - rule does not satisfy the PTA Condorcet principle.

Proof.

Consider an instance with three jobs and three agents with the following preferred schedules:

The only PTA Condorcet consistent schedule is with the total tardiness of 6. At the same time, the schedule has the total tardiness equal to 5. ∎

To construct a PTA Condorcet consistent schedule, we propose to extend Condorcet consistent [9, 20] election rules to jobs with varying lengths. For example, we obtain:

PTA Copeland’s method.

For each job we define the score of as the number of jobs such that at least agents put before in their preferred schedule. The jobs are scheduled in the descending order of their scores.

Iterative PTA Minimax.

For each pair of jobs, and , we define the defeat score of against as , where is the number of agents who put before in their preferred schedule. We define the defeat score of as the highest defeat score of against any other job. The job with the lowest defeat score is scheduled first. Next, we remove this job from the preferences of the agents, and repeat (until there are no jobs left).

Other Condorcet consistent election rules, such as the Dogdson’s rule or the Tideman’s ranked pairs method, can be adapted similarly. It is apparent that they satisfy the PTA Condorcet principle.

PTA Condorcet consistency comes at a cost: e.g., the two scheduling rules violate reinforcement, even if the jobs are unit-size. Indeed, by the classic result of Young and Levenglick [28] one can infer that any rule that satisfies PTA-Condorcet principle, neutrality, and reinforcement must be a generalization of the Kemeny rule (i.e., must be equivalent to the Kemeny rule if the processing times of the jobs are equal). We conjecture that rules satisfying neutrality and reinforcement fail the PTA-Condorcet principle; it is an interesting open question whether such an impossibility theorem holds.

3 Computational Results

In this section we study the computational complexity of finding collective schedules according to the previously defined rules. We start from the simple observation about the two PTA Condorcet consistent rules that we defined in the previous section.

Proposition 6.

The PTA Copeland’s method and the iterative PTA minimax rule are computable in polynomial time.

We further observe that computational complexity of the rules which ignore the lengths of the jobs (rules based on swap costs) can be directly inferred from the known results from computational social choice. For instance, the - rule is simply the well-known and extensively studied Kemeny rule. Thus, in the further part of this section we focus on the rules based on delay costs.

3.1 Sum of Delay Costs

First, observe that the problem of finding a collective schedule is computationally easy for the total lateness (-). In fact, - ignores the preferred schedules of the agents and arranges the jobs from the shortest to the longest one.

Proposition 7.

The rule - schedules the jobs in the ascending order of their lengths.

Proof.

Consider the total cost of the agents:

Thus, the total cost of the agents is minimized when is minimal. This value is minimal when the jobs are scheduled from the shortest to the longest one. ∎

On the other hand, minimizing the total tardiness - is -hard even with the unary representation of the durations of jobs. Du and Leung [10] show that minimizing total tardiness with arbitrary due dates on a single processor () is weakly -hard. We cannot use this result directly as the due dates in our problem - are structured and depend, among others, on jobs’ durations.

Theorem 8.

The problem of finding a collective schedule minimizing the total tardiness (- is strongly -hard.

Proof.

We reduce from the strongly -hard 3-Partition problem. Let be an instance of 3-Partition. In we are given a multiset of integers . We denote . We ask if can be partitioned into triples that all have the same sum, . Without loss of generality, we can assume that and that for each , (otherwise, we can add a large constant to each integer from , which does not change the optimal solution of the instance, but which ensures that in the new instance). We also assume that the integers from are represented in unary encoding.

From we construct an instance of the problem of finding a collective schedule that minimizes the total tardiness in the following way. For each number we introduce jobs: and . We set the processing time of to . Further, for each we set the processing time of to , and of the remaining jobs to . We denote the set of all such jobs as and . Additionally, we introduce jobs, , each having a unit processing time.

There are agents. For each integer we introduce agents. The -th agent corresponding to number , denoted by , has the following preferred schedule (in the notation below a set, e.g., denotes that its elements are scheduled in a fixed arbitrary order):

Figure 1: The preferred schedule of agent (top) and the optimal schedule (bottom).

We claim that the answer to the initial instance is “yes” if and only if the schedule optimizing the total tardiness is the following one: , where for each , is a set consisting of jobs from with lengths summing up to (see Figure 1). If such a schedule exists, then the answer to is “yes”. Below we will prove the other implication.

Observe that any job from should be scheduled before each job from . Indeed, for each pair and only a single agent ranks before ; at the same time there exists another agent who ranks first. As is shorter than , gains more from scheduled before , than gains from scheduled before . Thus, if were scheduled before , we could swap these two jobs and improve the schedule (such a swap could only improve the completion times of other jobs since is shorter than ).

By a similar argument, any job from should be scheduled before each job from . Indeed, if it was not the case, then there would exist jobs and such that is scheduled right before (this follows from the reasoning given in the previous paragraph—a job from cannot be scheduled after a job from ). Also, since all the jobs from are scheduled before , the completion time of would be at least . For each agent, the completion time of in their preferred schedule is at most equal to . Thus, if we swap and the improvement of the tardiness due to scheduling earlier would be at least equal to . Such a swap increases the completion time of only by one, so the increase of the tardiness due to scheduling later would be at most equal to . Consequently, a swap would decrease the total tardiness, and so could have not been scheduled after in .

We further investigate the structure of an optimal schedule . We know that and that , but we do not yet know the optimal order of jobs from . Before proceeding further, we introduce one useful class of schedules, , that execute jobs in the order . Observe that can be constructed starting from some schedule and performing a sequence of swaps, each swap involving a job and a job . The tardiness of is equal to the tardiness of the initial adjusted by the changes due to the swaps. Below, we further analyze . First, any ordering of in results in the same tardiness. Indeed, consider two jobs and such that is scheduled right after . If we swap and , then the total tardiness of agents increases by and the total tardiness of agents decreases by . In effect, the total tardiness of all agents remains unchanged. Second, there exists an optimal schedule where the relative order of the jobs from is . Thus, w.l.o.g., we constrain to schedules in which are put in exactly this order.

Since we have shown that all always have the same tardiness, no matter how we arrange the jobs from , the tardiness of only depends on the change of the tardiness due to the swaps. Consider the job , and consider what happens if we swap with a number of jobs from so that eventually is scheduled at time (its start time in all preferred schedules). In such a case, moving forward decreases the tardiness of each of agents by . Moving forward to requires however delaying some jobs from . Assume that the jobs from with the processing times are delayed. Each such job needs to be scheduled one time unit later. Thus, the total tardiness of agents increases by 1 (the agents who had this job as the first in their preferred schedule), of other agents increases by 1, and so on. Since , the total tardiness of all agents increases by . Thus, in total, executing at decreases the total tardiness by , a positive number. Also, observe that this value does not depend on how the jobs from were initially arranged, provided that can be put so that it starts at .

Starting earlier than does not improve the tardiness of , yet it increases tardiness of some other jobs, so it is suboptimal. By repeating the same reasoning for we infer that we obtain the optimal decrease of the tardiness when is scheduled at time , at time , etc., and if there are no gaps between the jobs. However, such schedule is possible to obtain if and only if the answer to the initial instance of 3-Partition is “yes”. ∎

A similar strategy (yet, with a more complex construction) can be used to prove the -hardness of -.

Theorem 9.

The problem of finding a collective schedule minimizing the total number of late jobs (-) is strongly -hard.

Proof.

We give a reduction from the strongly -hard 3-Partition problem. Let be an instance of 3-Partition. In we are given a multiset of integers . Similarly, as in the proof of Theorem 8, we set . In we ask if can be partitioned into triples that all have the same sum, . We assume that for each , , that , and that the integers from are represented in unary encoding.

From we construct an instance of the problem of finding a collective schedule that minimizes the total number of late jobs in the following way. For each number we introduce the following jobs:

  • a job of length ;

  • jobs of length ; we denote this set as:

  • jobs of length ; we denote this set as:

Let be the set of all the jobs. Further, we set:

Additionally, we introduce jobs, , each having a unit length, and a job of length (thus, the length of is larger than the length of all the jobs of ).

There are agents in total. For each number we introduce agents. Let be the set of these agents. We partition into sets of agents: . Figure 2 represents the preferred schedule of the -th agent from (). For all the agents, job () starts at time , and job starts at time . Further, for all the agents of , job starts at time (i.e., for these agents, is scheduled just before job ). Further, in this schedule job is put just before job : at time , and job () is scheduled at time if , and at time if . All the other jobs are scheduled after job , i.e., at soonest at time . Let us arbitrarily label the agents from to . The jobs of Agent which are not already scheduled before are scheduled in an arbitrarily order after , except that the latest jobs of the schedule are the jobs of which are scheduled before in the preferred schedule of Agent , followed by the jobs of which are scheduled before in the preferred schedule of Agent . This will ensure that each job of appears only twice in the last jobs of the agents (since, for each job of , only one agent schedules it before ).

Figure 2: Preferred schedule of the -th agent of .

We will now show that the answer to the 3-partition problem on instance is “yes” if and only if the optimal schedule for - on starts as follows: , where each set consists of jobs from with lengths summing up to .

If the schedule for - on starts as follows: , where each set () consists of jobs from with lengths summing up to , then the solution of 3-partition is “yes” since each job of has the length of a number of . Let us now assume that the solution of 3-partition on Instance is “yes”. We will show that the optimal solution of - on Instance indeed starts with: .

Let us consider an optimal schedule for . First, we will show that in , each job () is scheduled at latest at time .

Indeed, in the preferred schedules of all the agents, is completed at time . If in , would not be completed at latest at time , then it would mean that another job is scheduled before . In this case, swapping and would not increase the number of late jobs. This is the case because scheduling before decreases by (this is the number of agents) the number of late jobs, and the length of is smaller than the one of so the swap of and will not delay jobs other than .

Second, we will show that in , job is scheduled in the last position. For this we consider two cases:

  1. Let us first consider what happens if this job is not late, i.e., if it is scheduled in at latest at time . Let us now look at the last jobs of . Each of these jobs is late for at least agents (all the agents except two), since for each job of , only two agents have it in one of their last positions of their preferred schedule (and all these jobs are of length ). Thus the total number of late jobs is at least .

  2. Let us now consider the case where job is late in : it is scheduled after time . In this case, it will be late for all the agents, so we can assume that it is scheduled in the last position of . Thus, all the jobs of are scheduled before in (this is true since the length of is larger that the total length of the jobs of ). Since each job of appears only once before in the preferred schedules of the agents, each job of will be late for at most one agent: the number of jobs of which will be late is thus at most the number of jobs of : . The number of jobs of which are scheduled before in the preferred schedules of the agents is (indeed for each , job appears for agents just before job , with ). Thus, the number of jobs from which will be late in is at most equal to . Job is late for all the agents, and we have already seen that the jobs of are not late in . Therefore, the total number of jobs which will be late in is at most . This is smaller that the lower bound of the number of jobs late if is not late in