Sublinear Dynamic Interval Scheduling (on one or multiple machines)

03/27/2022
by   Paweł Gawrychowski, et al.
0

We revisit the complexity of the classical Interval Scheduling in the dynamic setting. In this problem, the goal is to maintain a set of intervals under insertions and deletions and report the size of the maximum size subset of pairwise disjoint intervals after each update. Nontrivial approximation algorithms are known for this problem, for both the unweighted and weighted versions [Henzinger, Neumann, Wiese, SoCG 2020]. Surprisingly, it was not known if the general exact version admits an exact solution working in sublinear time, that is, without recomputing the answer after each update. Our first contribution is a structure for Dynamic Interval Scheduling with amortized 𝒪̃(n^1/3) update time. Then, building on the ideas used for the case of one machine, we design a sublinear solution for any constant number of machines: we describe a structure for Dynamic Interval Scheduling on m≥ 2 machines with amortized 𝒪̃(n^1 - 1/m) update time. We complement the above results by considering Dynamic Weighted Interval Scheduling on one machine, that is maintaining (the weight of) the maximum weight subset of pairwise disjoint intervals. We show an almost linear lower bound (conditioned on the hardness of Minimum Weight k-Clique) for the update/query time of any structure for this problem. Hence, in the weighted case one should indeed seek approximate solutions.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

07/16/2020

Dynamic Geometric Independent Set

We present fully dynamic approximation algorithms for the Maximum Indepe...
03/05/2020

Dynamic Approximate Maximum Independent Set of Intervals, Hypercubes and Hyperrectangles

Independent set is a fundamental problem in combinatorial optimization. ...
12/30/2020

New Partitioning Techniques and Faster Algorithms for Approximate Interval Scheduling

Interval scheduling is a basic problem in the theory of algorithms and a...
04/01/2019

Fully Dynamic Data Structures for Interval Coloring

We consider the dynamic graph coloring problem restricted to the class o...
02/12/2020

Parameterized Complexity of Two-Interval Pattern Problem

A 2-interval is the union of two disjoint intervals on the real line. Tw...
02/29/2020

Dynamic geometric set cover and hitting set

We investigate dynamic versions of geometric set cover and hitting set w...
11/21/2020

Improved Dynamic Algorithms for Longest Increasing Subsequence

We study dynamic algorithms for the longest increasing subsequence (LIS)...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Interval Scheduling (IS) problem is often used as one of the very first examples of problems that can be solved with a greedy approach. In this problem, we have a set of jobs, the -th job represented by an interval . Given such intervals, we want to find a maximum size subset of pairwise disjoint intervals. In this context, disjoint intervals are usually called compatible. This admits a natural interpretation as a scheduling problem, where each request corresponds to a job that cannot be interrupted and require exclusive access to a machine. Then, the goal is to schedule as many jobs as possible using a single machine. The folklore greedy algorithm solves this problem in time, assuming that the intervals are sorted by the values of  [19]. While it may appear to be just a puzzle, interval scheduling admits multiple applications in areas such as logistics, telecommunication, manufacturing, or personnel scheduling. For more applications and a detailed summary of different variants of interval scheduling, we refer to [21].

In many real-world applications, there is a need for maintaining the input under certain updates (for example, insertions and deletions of items), so that we can report the optimal solution (or its cost) after each operation. The goal is to avoid the possibly very expensive recalculation of the answer (which surely takes at least linear time in the size of the input) by maintaining some kind of additional structure. The first step in this line of research is to design a structure with sublinear update/query time. Then, the next goal is to bring down the time complexities to polylogarithmic (in the size of the input). Examples of problems in which this has been successfully accomplished include dynamic graph connectivity [23, 16, 17], dynamic longest increasing subsequence [20, 13], dynamic suffix array [2, 18], dynamic graph clustering [10], and many others. For some dynamic problems no such solutions are known, and we have tools for proving (conditional) polynomial hardness for dynamic algorithms [14].

This suggests the following Dynamic Interval Scheduling (DIS) problem, in which we want to maintain a set of intervals subject to insert and delete operations. After each update, we should report the size of the maximum size subset of pairwise compatible intervals. Note that reporting the subset itself might be not feasible, as it might contain intervals. Similarly, neither is explicitly maintaining this subset, as an update might trigger even changes in the unique optimal subset. Thus, the challenge is to maintain an implicit representation of the current solution that avoids recomputing the answer after each update, that is, supports each update in sublinear time. Besides being a natural extension of a very classical problem, we see this question as possibly relevant in practical application in which we need to cope with a dynamically changing set of jobs.

1.1 Previous work

Surprisingly, to the best of our knowledge, the complexity of general exact DIS was not considered in the literature. However, Gavruskin et al. [12] considered its restricted version, in which there is an extra constraint on the set . Namely, it should be monotonic at all times: for any two intervals we should have and or vice versa. Under such assumption, there is a structure with amortized time per update and amortized time per query. Alternatively, the update time can be decreased to if the query only returns if a given interval belongs to the optimal solution.

For the general version of DIS, Henzinger, Neumann and Wiese [15] designed an efficient approximation algorithm that maintains an -approximate solution in polylogarithmic time. The dependency on has been very recently improved from exponential to polynomial by Compton, Mitrović and Rubinfeld [7]. In fact, both solutions work for the weighted version of the problem, called Dynamic Weighted Interval Scheduling (DWIS). In this problem, each interval has its associated weight, and the goal is to maintain a subset of pairwise compatible intervals with the largest total weight. Note that the static version of this problem, called Weighted Interval Scheduling (WIS), can be solved by a straightforward dynamic programming algorithm [19] (but the greedy strategy no longer works now that we have weights). This brings the challenge of determining if the unweighted (and weighted) version of the problem admits an efficient exact solution.

A natural generalization of interval scheduling is to consider multiple machines. In such a problem, there is a shared set of jobs to process, each job can be either discarded or scheduled on one of the available machines. Jobs scheduled on each machine must be pairwise compatible. The goal is to maximize the number (or the total weight) of scheduled intervals. IS on multiple machines (IS+) can be solved by extending the greedy algorithm considering intervals by the earliest end time. For each considered interval, if no machine is free at the respective time, the interval is discarded. If there are some free machines, the interval is assigned to the available machine that was busy at the latest. A direct implementation of this approach incurs a factor of in the running time, but this can be avoided [11, 6]. The weighted version of the problem (WIS+) can be formulated and solved as a min-cost flow problem [3, 5]. For the dynamic version, Compton, Mitrović and Rubinfeld [7] extend their methods for maintaining an approximate answer to multiple machines, however, their bounds are mostly relevant for the unweighted case. A related (but not directly connected) question is to maintain the smallest number of machines necessary to schedule all jobs in the current set [12].

1.2 Our contribution

In this paper, we consider dynamic interval scheduling on one and multiple machines. We show that the unweighted version of the problem admits a sublinear dynamic solution, and furthermore, we make non-trivial progress on decreasing the exponent in the time complexity of the solution.

The starting point is a simple structure for the general DIS problem with amortized update/query time. This is then improved to amortized update/query time. For multiple machines, we begin with , and show how to solve the corresponding problem, denoted DIS2, in amortized time per update. Next, we use this solution to solve the general DIS+ problem in amortized time per update. While designing a solution working in time is not very difficult, our improved time bounds require some structural insight that might be of independent interest.

Theorem 1.

There is a date structure for Dynamic Interval Scheduling on machines that supports any update in amortized time.

We complement the above result by a (conditional) lower bound for the weighted version of the problem, even with . We show that, for every , under the Minimum Weight -Clique Hypothesis, it is not possible to maintain a structure that solves DWIS in

time per operation. This shows an interesting difference between the static and dynamic complexities of the unweighted and weighted versions: despite both IS and WIS admitting simple efficient algorithms, DIS admits a sublinear solution while DWIS (probably) does not.

1.3 Techniques and ideas

A natural approach to DIS is to efficiently simulate the execution of the greedy algorithm.

Definition 2.

For an interval , the leftmost compatible interval is the interval with the smallest such that and if there is no such interval.

Note that if the greedy algorithm includes in the solution then it also includes . Thus, it is easy to prove that if is the interval with the smallest in , then the (optimal) solution generated by the greedy algorithm is .

One can consider a forest in which each interval is represented as a node and an interval has parent . By creating an artificial root and connecting all forest roots’ to it, we make this representation a tree. We call it the greedy tree (of ). The answer to the DIS query is the length of the longest path from any node to the root in the tree. We know this is actually the path from the earliest ending interval thanks to the greedy algorithm.

Figure 1: An input instance for DIS with the optimal solution generated by the greedy algorithm marked using bold lines and the corresponding greedy tree.

A standard approach used in dynamic problems is splitting the current input into several smaller pieces and recomputing some information only in the piece containing the updated item. Then, the answer is obtained by using the information precomputed for every piece. An attempt to use such an approach for DIS could be as follows. We partition into parts, either by the start or the end times, and in every part we precompute the result of running the greedy algorithm from every possible state. The goal is to accelerate running the algorithm by being able to jump over the parts. For , we can simply maintain the greedy tree, as it allows us to simulate running the greedy algorithm not only from the interval with the smallest end time but in fact from an arbitrary interval . We call this resuming the greedy algorithm from . This allows us to jump over the whole part efficiently, and by appropriately balancing the size of each part we obtain a data structure with time per update. This is described in detail in Appendix A. A similar approach works for , except that instead of the greedy tree we need to preprocess the answer for every -tuple of intervals, resulting in time per update.

We improve on this basic idea for both and . For , we design a way to solve the decremental variant of DIS in only (amortized) polylogarithmic time per update, and couple this with maintaining a buffer of the most recent insertions. For , the greedy tree is no longer sufficient to capture all possible states of the greedy algorithm. However, by a careful inspection, we prove that for a piece consisting of intervals, instead of precomputing the answers for all possible states, it is enough to consider only carefully selected states. For , we further extend this insight by identifying only states, called compressible. Interestingly, using these states to simulate the greedy algorithm starting from an arbitrary state requires a separate precomputation, hence we need to consider the case separately.

2 Interval scheduling on one machine

For our structures, it is sufficient that and characterizing intervals are pairwise comparable but to simplify the presentation, we assume that . One can also use an order maintenance structure [9, 4] to achieve worst-case constant time comparisons between endpoints even if we only assume that when inserting an interval we know just the endpoints of existing intervals in that are the nearest predecessors of and . We make endpoints of all intervals pairwise distinct with the standard perturbation. We assume that each insert operation returns a handle to the interval which can later be used to delete.

Our structures work in epochs. At the beginning of each epoch, we set

to be the number of intervals in . When the number of intervals is outside range , the new epoch begins. At the beginning of an epoch, we construct an additional data structure of all intervals in by a sequence of inserts in any order. These reconstructions have no impact on the amortized update time complexity as actual operations are turned into insertions and deletions. We maintain during the epoch.

We maintain a global successor structure storing all intervals sorted by their end time that enables efficient computation of . There are separators that split the universe of coordinates into parts of similar size. Intervals are assigned into parts by their start time. Some intervals are internal (if they fully fit in the part) and other are external (otherwise). Both and structures are able to efficiently find the internal result for an interval in a part, that is how many intervals the greedy algorithm can choose from until reaching the exit (the last selected) interval of the part, so DIS query is solved by iterating over these parts and applying the exit of one part as an input to the next one.

We recommend reading Appendix A, where we introduce the above idea by showing a simpler but slower algorithm. Here we extend this approach and present a data structure showing the following.

Theorem 3.

There is a data structure for DIS that supports any sequence of insert/delete/query on intervals in amortized time per each operation.

The separators are chosen such that each part has size at most and for any two consecutive parts and at least one has size at least . Thus, there are always parts. More details on how to maintain this partition are provided in Appendix A.

Since our goal is to achieve update time and parts are larger, we cannot afford to recompute the whole part from scratch for every update in it (as we did in Appendix A). Instead, we keep internal intervals of a part in two structures: a decremental structure and a buffer. External intervals are only kept in the global balanced binary search tree containing all the intervals. We first sketch the idea and describe the details in the following subsections.

The decremental structure of each part contains intervals, has no information about buffer intervals, can be built in time and allows deletions in time. The buffer contains only at most last inserted internal intervals in . Each operation in a part leads to the recomputation of information associated with the buffer in time. When overflows, we rebuild the decremental structure from scratch using all internal intervals from the part and clear the buffer. Such recomputation happens every updates inside a part. This way the update time of our solution can still be within the claimed bound.

As the optimal solution may use intervals both from the decremental collection and the buffer interchangeably, we need to combine information stored for these sets. For buffer intervals, we can afford to precompute the whole internal result and the exit of the part being fully aware of the content of the decremental collection. However, we also need to “notify” intervals of the decremental collection about potential better solutions that can be obtained by switching to buffer intervals. For this we store an additional structure of total size of , recomputed every update in a part, specifying for which intervals of the decremental collection there exists an “interesting” buffer interval.

2.1 Active and inactive intervals

Definition 4.

An interval in a collection of intervals is active if there is no other such that . Otherwise is inactive.

Lemma 5.

For any set of intervals and an interval , the greedy algorithm for IS resumed from chooses (after ) only active intervals from .

Proof.

Assume there are two intervals and such that . is considered earlier by the greedy algorithm. If it is scheduled, can no longer be scheduled as and are overlapping. If it is not, also can not be scheduled as the set of compatible intervals with is the subset of the compatible intervals with . ∎

A collection of only active intervals is monotonic by definition. This provides a well defined, natural order on the active intervals in the collection: . Because of this additional structure, we focus on describing how to maintain the subset of active intervals inside a collection and only look for the solution of (D)IS in this subset.

The decremental structure in each part only allows rebuilding and deletions. We maintain set of active intervals in the decremental collection. When an interval from is deleted, the set should report new active intervals. We stress that the decremental structure is not aware of any buffer intervals of and in order to determine if a particular interval is active in the decremental collection we do not take into account any buffer intervals.

Lemma 6.

There is a structure that allows maintaining the subset of active intervals in a delete-only or insert-only collection of size in amortized time per insertion/deletion and can be built in time.

Proof.

Each interval is translated into a point in a plane. We say that point dominates point if . Point is then dominated by . We say that a point is dominated if there is a point that dominates it. The interval is active in the collection if and only if the point representing it is not dominated. The set of non-dominated points forms a linear order: the larger -coordinate implies the smaller -coordinate. We store the front of non-dominated points in a predecessor/successor structure. Additionally, we maintain a range search tree indexed by storing in each node the points of the appropriate range of -coordinates and what is the point with the maximum among them.

We start by describing the insert-only structure. When a point is inserted, we search for its predecessor and its successor in the front of non-dominated points. This way we can either find if is dominated by or if it dominates . We then update the front and the range search tree appropriately.

To build the delete-only structure, we insert points one by one in any order as described above. When a point is deleted, we search for its predecessor and its successor in the front and find what are the points in the range that become non-dominated, that is what are new maximums of nodes in the range search tree after removal of from appropriate nodes. These new non-dominated points are added to the front and each interval from the decremental structure is activated only at most once. Thus, the time charged to each interval in the collection is bounded by . ∎

2.2 Decremental structure

For , we define to be the next greedy choice in after .

Proposition 7.

The set of greedy predecessors of () forms a continuous range of active intervals in .

Proof.

For any active intervals , , we have , so if there are three active intervals such that then also . ∎

Intervals of form a forest where a node representing an interval is the parent of ’s node when . As in the previous section, we add an auxiliary interval to make this representation a tree, we denote it and call it a greedy tree of the part . Greedy predecessors of are the children of node in the greedy tree. We stress that the greedy tree is built only for the intervals of the decremental collection.

We internally represent the greedy tree as an augmented top tree [1]. This allows maintaining underlying fully dynamic forest (updates are insertions/deletions of edges and changes to node/edge weights). Because deletions and activations of intervals in the decremental structure may change values of for many nodes, we slightly alter the structure as described in Section 2.3. This is also one of the reasons why one cannot apply techniques described in [12] to solve even the decremental variant of DIS despite being able to efficiently maintain the (monotonic) set of active intervals.

When is to be deleted, its children have to connect to other nodes of the greedy tree. Let before deletion and are the activated intervals after removing . Note that . Other nodes than the elements of do not change its parent.

We first observe that are the only possible parents for nodes in , remind the fact that and use Proposition 7 to see that some (possibly empty) prefix of children sequence () has to be connected to , then the next range () has to be connected to and so on until finally some suffix of children sequence () has to be connected to . We use binary search on the children sequence to find indices in this order. We update the parents of the nodes in the found ranges in the greedy tree as described in Section 2.3 and it takes per each activated interval.

Using the appropriate query to the top tree, we can resume the execution of the greedy algorithm restricted to from any in time.

2.3 Top tree

The underlying information maintained in is chosen to compute the following:

  • weighted level ancestors,

  • nearest marked ancestors,

  • the total path weight from a node to the root (the sum of weights).

The discussion on how to maintain information that allows efficient computation of the above in can be found in [1].

represents an underlying modified greedy tree

, namely, we binarize the tree by reorganizing the children of each non-leaf node and adding auxiliary nodes as presented in

Fig. 2. A node in such a modified greedy tree that represents an actual interval has a weight , all other auxiliary nodes have a weight . The weight of the path between nodes is the sum of the weights of the nodes on the path (including the endpoints). This way, the weight of a path from a node representing an interval to the root of the modified greedy tree represents the number of intervals chosen by the greedy algorithm from .

Figure 2: A part of a greedy tree is shown on the left and the modified greedy tree represented by is shown on the right. We assume . One can retrieve -th child of a node by querying for the level ancestor from node (ignoring the weights of nodes).

is always monotonic so we use order on children. This way, we can update values of for a range of children of a node in time by the appropriate splits and joins in . Apart from auxiliary nodes, pre-order traversals of and are equal.

and are only internal representations of that enable efficient implementation of the necessary operations. Any updates of are naturally translated into updates of and or were described above. We proceed with describing the further details on .

Definition 8.

For an interval , we define its depth as the depth in . The set of intervals of the same depth is called a layer in (or ).

Note that if we traverse the greedy tree in BFS order (visiting children left-to-right) we obtain exactly order. Thus, when comparing two intervals on the same layer we can just see which one is earlier in the pre-order traversal of . This way we can treat layers as sorted collections of intervals (actually, subranges of ).

Remark 9.

We already have all the ingredients for the algorithm to solve the delete-only DIS variant in time. In this case, we do not partition intervals nor use a buffer. Instead, we only use the top tree representing the greedy tree of all the active intervals in the whole decremental collection of intervals.

Similarly, we remind that the structure for maintaining the subset of active intervals can be also maintained for the insert-only variant of DIS (Lemma 6). Now we also observe that we can maintain the greedy tree when the intervals are only inserted. A new interval may only improve for some continuous range of intervals and we can binary search the endpoints of this range. To account for the cost of reconnecting these nodes, which may have many different parents, we observe that for any insertion, there is only at most one interval that loses a child in the greedy tree and is not deactivated. We charge the time of reconnection of the range of its children to the insertion of . We charge the time needed to reconnect other nodes to the insertion of their (deactivated, thus actually deleted) parent. This establishes the time complexity of the insert-only variant of DIS to .

2.4 Buffer

Definition 10.

For intervals and , we say that directly wants to switch to if and only if all the following conditions hold:

  • ends earlier than ,

  • and are compatible,

  • .

The aim of the above definition is to capture that sometimes the value of may be different from . Note that if directly wants to switch to it does not necessarily imply that . It just means that is (in sense of ) a better next greedy choice for than it appears from the computation in the decremental collection. Note that it also means that the greedy algorithm resumed from any node in the subtree of in the greedy tree will not choose . Thus we define the following.

Definition 11.

For intervals , we say that wants to switch to if and only if there exists an integer such that directly wants to switch to .

Figure 3: An instance of DIS, example part. Intervals in the decremental collection are shown above the dotted line and buffer intervals are below. Dashed arrows connect intervals with their respective . Here intervals 1, 2, 3 and 4 want to switch to B1 (3 and 4 directly) and interval 3 wants to (directly) switch to B2.
Proposition 12.

For an interval , there exist an integer such that the set of intervals in that directly want to switch to is either:

  • a continuous range of a layer ,

  • a suffix of layer and a prefix of layer .

Proof.

Let and assume that and want to switch to . Then, also wants to switch to : ends earlier than because wants to switch and can switch to because can. This shows that the nodes that want to switch to form a continuous range in . Active intervals that directly want to switch to any particular are pairwise overlapping. Indeed, with of any two compatible intervals , we would have so ends earlier than any buffer interval compatible with to the right of . This also proves that a node and its parent in the greedy tree cannot both directly want to switch to the same buffer interval thus completing the proof. ∎

Proposition 12 shows that the actual size of the information needed to notify the intervals from that want to directly switch to a particular buffer interval is short. For each buffer interval, it is enough to remember endpoints of at most two ranges.

We want to efficiently store also indirect switching. Intervals that want to switch to are the nodes in subtree of any node in ranges from Proposition 12. For range from to on layer that wants to directly switch to , any node on layer satisfying wants to switch to , see Fig. 4. We use a 2D range search tree indexed by depth and intervals of in order. The structure allows us to store a collection of three-sided rectangles, so that given query point we can check if it is contained in at least one of the rectangles. To mark nodes as in Fig. 4 we add to the tree.

Figure 4: Indirect possibility of switching to for . Nodes of -th layer in range from to directly want to switch to . In the gray area are the nodes that want to switch to .

An interval may want to switch to multiple intervals but the actual switching point for any is the earliest in (the deepest in ) interval that wants to directly switch to a buffer interval on the path from to the root in . We can deduce the actual earliest switching to buffer interval from any on layer in time by using a binary search on depth , each time querying the 2D range search tree if a point is covered by at least one rectangle. The result for the prefix of the path until reaching the buffer can be obtained from the top tree . We recreate the whole range search tree after an update in the part.

For any we store the total length of the path to the root of (this is the internal result for in ) and the latest actual interval of just before reaching the root (this is the exit for in ). This information is recomputed for all buffer intervals in using dynamic programming by iterating the buffer intervals by decreasing end times as follows. For , we compute and if it is a buffer interval, we use its exit result and its internal result plus as the information for (and, by the order of the computation, we already know these). If , we query the decremental collection for the next buffer interval after selected by the greedy algorithm as described above and combine its result with the prefix of the traversed path from in the decremental collection. This is computed in time.

3 Interval scheduling on multiple machines

We stress that we assume that there are constant number of machines thus we are going to ignore factors in time complexities. The difference between naive application of standard techniques and our algorithms is negligible when is large.

As the main idea of our algorithm is to efficiently simulate the folklore greedy algorithm for IS+ (described in [11, 6]), we now remind it. The intervals are considered separately by the earliest end time. For each considered interval, if there is no available machine at the time, the job is rejected. Otherwise, it is accepted and assigned the available machine that was busy at the latest time. The proof of correctness is a standard exchange argument.

The state of the partial execution (up to some time ) of the greedy algorithm can be fully described by the sequence of length , where -th entry describes which interval was last scheduled on -th machine before or at time . Some of the entry intervals to may not belong to if some machine had not accepted any intervals in . At the same time, we want to preprocess information only for tuples of intervals from , thus we need the following additional notation.

Definition 13.

The greedy state (at time ) is the (multi)set of input intervals. Each element means that at time there is a machine that was busy up to time . We use elements to indicate that there is a machine which was busy up to time .

indicates that the particular machine is blocked for all intervals that start too early. Thus, despite each interval can only be selected once, we may want to mark that some machines are busy up to the same time. For this reason, we decided to use multisets for greedy states. can be simulated by an artificial interval .

The greedy algorithm only considers values of that are end times of intervals in the input. We slightly abuse the notation and use to denote the greedy state at time and assume the intervals are ordered according to the order of the IS+ algorithm i.e. . To not consider cases with we add pairwise overlapping intervals ending all earlier than the beginning of any actual input interval.

If , exactly one element of needs to be updated to obtain . It is the one that is ending the latest among the elements of compatible with . One can see the same from a slightly different perspective. Let assume that is the index for which is the earliest ending interval among . Then and . We call the next greedy state after and denote it . Because can be computed in time using the appropriate structure as described in Appendix A, we iterate through all candidates for in the greedy state and thus have the following.

Corollary 14.

can be computed in time for any .

We use insights from Section 2 and Appendix A and split the intervals into parts of size at most . But now the part to which the interval belongs is determined by its end. Other details like epochs, splitting and merging the parts remain the same. We restrict to only consider intervals in the same part as the argument of the operation (it can return ). We build an additional structure for internal intervals in each part and rebuild it every update in the part. As in the case of interval scheduling on one machine, our goal is to be able to efficiently handle (in time) a query for the internal result (the number of accepted intervals) and the exit greedy state from the part for a given entry greedy state in the part – we call this the part query from the greedy state .

Notice that during the execution of the greedy algorithm up to , it may happen that some machine will not accept any new interval in , so the exit greedy state coming from may contain intervals also from earlier parts. Let us now describe how to translate such an exit greedy state coming from into an entry greedy state of , so we can later only consider the content of one part. We observe that the decisions of the greedy algorithm only depend on the relative order of endpoints of the considered intervals. If a machine was busy up to time and there are no intervals starting before time , we can safely assume that the machine is busy up to time without changing the execution of the greedy algorithm. Thus, we round up the end of each interval in the greedy state to the earliest start of some interval in . See Fig. 5. We stress that the result of rounding is not necessarily part of the solution generated by our algorithm. It just indicates times up to which the machines are busy. After computing the exit greedy state for , we inspect if there are machines that have not accepted any intervals from and revert the rounding for these.

Figure 5: Translation of an exit greedy state from part . Each interval of is rounded to the earliest starting interval in that is later than the end of the interval (denoted by dashed directed edge). Thus, we can assume that the entry greedy state in is .

We stick to Definition 4, but we cannot make direct use of Lemma 5 because in the case of multiple machines it may happen that inactive intervals are part of the optimal solution. As these intervals may not form a monotonic collection, we redefine order as follows: . We still maintain the greedy tree and the top tree 111We could also use simpler structures as we only need a subset of operations provided by the top tree and we can afford to rebuild the structure from scratch every update. as described for one machine. We identify intervals with the nodes representing them in .

Lemma 15.

Let be an inactive interval and let be the latest (in ) interval contained inside . If then also .

Proof.

Any interval compatible with is also compatible with and ends earlier than . This means that if is accepted then also is (at time ). From time up to time the machine that accepted cannot accept other interval: it would have to start after and end before thus violating our assumption that is the latest interval contained inside . This implies that . ∎

Lemma 16.

Let be a greedy state for which all elements are active intervals. Let be the earliest (in ) interval being a common ancestor of any pair of elements of . Let be a prefix of intervals preceding and let be the latest of .

Then are the only elements scheduled by the greedy algorithm for IS+ resumed from before reaching time . Additionally, just before time the greedy state of the algorithm is .

Proof.

First, we make a technical note that thanks to the artificial root added to form the greedy tree, the interval always exists.

The candidates for values of are s with exactly one of the intervals replaced by its assigned to the same machine. Thus, by using this reasoning inductively for for increasing , we observe that when moving forward along the path from any to the root in the greedy tree, at least until reaching some interval , all the traversed intervals will be scheduled on the same machine as . Additionally, for different , the paths from and in the greedy tree do not share any nodes that are (by definition). This way, all and the only elements that are included in some greedy state after before considering are the elements of and also just before considering all the latest elements of are in the greedy state. See Fig. 6. ∎

Figure 6: Lemma 16 for . Here we assume . Elements of are filled dots. We have and .

If the elements of greedy state are all active, we can naively compute as in Lemma 16 by checking LCAs of all pairs of intervals in in the greedy tree and then proceeding to the last interval before independently from each node to obtain the last greedy state before reaching as in Fig. 6. Thus we have the following.

Corollary 17.

Let be a greedy state with only active intervals and let be defined as in Lemma 16. It is possible to compute both the smallest for which and the value of itself in time.

3.1 An -time algorithm for two machines

In this section, we focus on describing an efficient algorithm for dynamic interval scheduling on two machines and prove the following.

Theorem 18.

There is a data structure for DIS2 that supports any sequence of insert/delete/query on intervals in amortized time per each operation.

Lemma 19.

There are only three possible forms of a greedy state for two machines.

  • where , and are compatible and is active,

  • where ends earlier than , and are overlapping and is active,

  • where an active interval is fully contained inside (an inactive) .

Proof.

First we assume, without losing generality, that the greedy algorithm considered at least two intervals and resumes from the greedy state that is of the form (a) and , . One can easily prepend any instance of DIS+ with few intervals to achieve this.

Let assume that the next accepted interval by the greedy algorithm is and the next greedy state after is . There are three cases (as on Fig. 7):

  • – then is an active interval and is of the form (a),

  • – then is an active interval and is of the form (b),

  • – then is an inactive interval and is of the form (c).

We now proceed to similar analysis of what are the forms of next greedy states that can be reached from states of the form (b) and (c).

If is of the form (b) then is either compatible with (case (ba)) and is of the form (a), or it overlaps with (case (bb)) and is of the form (b). Note that can not overlap with as then would be rejected.

Similarly, if is of the form (c) then is either compatible with (case (ca)) and is of the form (a), or it overlaps with (case (cb)) and is of the form (b).

No other forms than (a), (b) or (c) are reachable from (a) and this concludes the proof. ∎

Figure 7: Three possible forms of a greedy state and cases as in Lemma 19. Active intervals are marked with bold lines and potential cases for are marked with dotted lines. Note that in forms (a) and (b) may be either active or inactive.

We now describe our algorithm for DIS2. For each part it maintains the following:

  • for all active intervals – the result of part query from the greedy state of the form (b) where is direct successor (in order),

  • for all inactive intervals – the result of part query from the greedy state of the form (c) where is the latest (in order) active interval fully inside .

When a part is updated, and structures are rebuilt from scratch. Computation of or is nothing else than answering a part query for the appropriate greedy state. We ask these queries in decreasing order of the sum of indices (in order) of the two intervals of the greedy state. This way, during the recomputation of and structures, whenever the algorithm is going to use some other result of or it is already computed as the queried sum of indices will be larger. See the details below.

Additionally, for each active interval we precompute the earliest (in ) interval not on the path from to the root of . We do this using dynamic programming, inspecting all the intervals in decreasing order of and it takes time. Similarly, we precompute the number of intervals on the path from to the latest interval ending earlier than .

We now describe how to answer the part query from a greedy state following the proof of Lemma 19 and considering all forms of .

If is of the form (a) we focus on finding the greedy state for which is the smallest such that . If is replaced in by an active interval , it has to be the earliest (in ) interval overlapping with an interval on the path from to the root in (it can also be itself). We know which one and what is the contribution to the internal result as we precomputed it. Moreover, we observe that and its part result is stored in so we just read the result from there. If is replaced in by an inactive interval it has to be the earliest (in ) interval compatible with . Then where is the latest (in order) active interval fully inside . Thus, we read the part result for from .

If is of the form (b), then is either of the form (a) for which we proceed as described above or of the form (b) but with both greedy state intervals active (case (bb) of the proof of Lemma 19), for which we use Corollary 17 to reach the greedy state of the form (a) and later proceed as described above.

If is of the form (c), then is either of the form (a) or (b) and we proceed as described above.

3.2 An -time algorithm for machines

Surprisingly, before we start describing the final algorithm for machines, we need an additional building block for the two machine case.

Definition 20.

For a collection of intervals , for from , we define the first machine replacement to be the interval in which replaces in the greedy state when resumed the greedy execution from the greedy state on two machines. In other words, is the earliest ending accepted interval after that will be scheduled on the same machine as by the greedy algorithm for IS+.

Figure 8: Both dotted intervals are accepted by the machine that accepted and the dashed interval overlaps with so is rejected. The left dotted interval in the example is , the solution of the subproblem from the computation of .

Within the desired time bounds, for , we can afford recomputing in parts from scratch for every pair of intervals in the updated part, as long as this recomputation takes time. We could not do the same for .

Lemma 21.

The values of for all pairs of intervals in a collection of intervals, can be computed in time.

Proof.

Assuming that intervals in part are ordered by and given names in line with this order, we compute in decreasing order of the sum of indices. To compute , for and we first find the earliest ending interval that ends later than and is compatible with . To solve this subproblem we take a geometric view: each interval is converted into a point in 2D plane, the goal is to find the point with smallest -coordinate above and to the right of . This is solved by a 2D range search tree indexed by -coordinates storing the appropriate result. Thus, the subproblem is solved. We proceed with the computation of . We have two cases: either is overlapping with and then or is compatible with and then which is already known by the order of the computation. We can also compute the number of intervals chosen by the greedy algorithm when resumed from state until reaching (just or the number chosen from plus depending on the above cases). ∎

As it turns out, the values of play important role in the algorithm for . We want to preprocess tuples of possible entry greedy states for a part to be able to efficiently answer part queries. The problem is that we have intervals in each part, but we aim at time complexity. Thus, we cannot precompute part queries for all possible greedy states. Instead, we carefully select specific compressible greedy states for which part query results are actually stored and design an algorithm that can push the simulation forward to the next compressible state or the exit state from the part.

Definition 22.

Let be a greedy state. We assume . We say that is compressible if at least one of the following conditions hold:

  • is inactive,

  • is active and exists active interval such that ,

  • .

Lemma 23.

In a part of intervals for DIS+ on machines, there are only compressible greedy states.

Proof.

We consider all the forms of the compressible greedy state as in Definition 22.

  • from Lemma 15 we know that the greedy state also contains the latest interval fully inside and thus we can forget this interval, so there are such states,

  • there is an edge in the greedy tree of , we can store -tuple of other intervals and the identifier of the appropriate edge, so there are such states,

  • we forget as it is equal to , so there are such states. ∎

Note that we can decompress the representations from Lemma 23 in time to obtain a full greedy state of size . Also, by taking into account the sizes of the parts, we obtain that there are only compressible greedy states for intervals in .

For an update in , we recompute part query results for all compressible greedy states in . As in Section 3.1, we do this using dynamic programming, in decreasing order of the sum of indices of the uncompressed state. The problem of computing the results for the states stored in the dynamic programming table is once again translated into the general query that has -tuple as an input and has to push the simulation forward either to the next part or at least to a compressible greedy state from which we read the already preprocessed result and combine it with the traversed prefix of the path. We proceed with describing how to solve this general query.

We distinguish three forms of the greedy state for :

  • is inactive,

  • there is such that all intervals are active,

  • all are active.

For the (*) case, we compute . Either the latest accepted interval in is inactive and then is compressible of type (a) or it is active, thus is either of the (**) or (***) form and we proceed with it as described below.

For the (***) case, we use Corollary 17 to find the earliest greedy state for which . We observe that such is compressible of type (b), as both and at least one of its children are elements of