Home, SafeHome: Smart Home Reliability with Visibility and Atomicity

07/24/2020 ∙ by Shegufta Bakht Ahsan, et al. ∙ Microsoft University of Illinois at Urbana-Champaign 0

Smart environments (homes, factories, hospitals, buildings) contain an increasing number of IoT devices, making them complex to manage. Today, in smart homes where users or triggers initiate routines (i.e., a sequence of commands), concurrent routines and device failures can cause incongruent outcomes. We describe SafeHome, a system that provides notions of atomicity and serial equivalence for smart homes. Due to the human-facing nature of smart homes, SafeHome offers a spectrum of visibility models which trade off between responsiveness vs. incongruence of the smart home state. We implemented SafeHome and performed workload-driven experiments. We find that a weak visibility model, called eventual visibility, is almost as fast as today's status quo (up to 23% slower) and yet guarantees serially-equivalent end states.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The disruptive smart home market is projected to grow from $27B to $150B by 2024 [60, 61]. There is a wide diversity of devices—roughly 1,500 IoT vendors today [44], with the average home expected to contain over 50 smart devices by 2023 [25]. Smart devices cover all aspects of the home, from safety (fire alarms, sensors, cameras), to doors+windows (e.g., automated shades), home+kitchen gadgets, HVAC+thermostats, lighting, garden sprinkler systems, home security, and others. As the devices in the home increase in number and complexity, the chances of interactions leading to undesirable outcomes become greater. This diversity and scale is even vaster in other smart environments such as smart buildings, smart factories (e.g., Industry 4.0 [59]), and smart hospitals [62].

Past computing eras—1970s’ mainframes, 1990s’ clusters, and 2000s’ clouds—were successful because of good management systems [17]. What is desperately needed are systems that allow a group of users to manage their smart home as a single entity rather than a collection of individual devices [23]. Today, most users (whether in a smart home or a smart factory) control a device using commands, e.g., turn ON a light. Further, major smart home controllers have started to provide users the ability to create routines. A routine is a sequence of commands [64, 6, 54, 76]. Routines are useful for both: a) convenience, e.g., turn ON a group of Living Room lights, then switch on the entertainment system, and b) correct operation, e.g., CLOSE window, then turn ON AC.

Motivating Examples: Today’s best-effort way of executing routines can lead to incongruent states in the smart home, and has been documented as the cause of many smart home incidents [38, 67, 31, 26, 54] 111 While security issues also abound, we believe such correctness violations are very common and under-reported as a pain point. . First, consider a routine involving the AC and a smart window [27, 73]: = {CLOSE window; switch ON AC}. During the execution of this routine, if either the window or the AC fails, the end-state of the smart home will not be what the user desired—either leaving the window open and AC on (wasting energy), or the window closed and AC off (overheating the home). Another example is a shipping warehouse wherein a robot’s routine needs to retrieve an item, package it, and attach an address label—all these actions are essential to ship the item correctly. In all these cases, lack of atomicity in the routine’s execution violates the expected outcome.

Our next example deals with concurrent routines. Consider a timed routine that executes every Monday night at 11 pm and takes several minutes to run: ={OPEN garage; MOVE trash can out to driveway (a robotic trash can like SmartCan [58]); CLOSE garage}. One day the user goes to bed around 11 pm, when she initiates a routine: ={switch OFF all outside lights; LOCK outside doors; CLOSE garage}. Today’s state of the art has no isolation between the two routines, which could result in shutting the garage (its last command) while is either executing its first command (open garage), or its second command (moving trash can outside). In both cases, ’s execution is incorrect, and equipment may be damaged (garage or trash can). Concurrency even among short routines could result in such incongruences—Figure 1 shows such an experiment. The plot shows that two routines simultaneously touching only a few devices cause incongruent outcomes if they start close to each other. In all these cases, isolation semantics among concurrent routines were not being specified cleanly or enforced.

Figure 1: Concurrency causes Incongruent End-state in a real smart home deployment. Two routines R1 (turn ON all lights) and R2 (turn OFF all lights) executed on a varying number of devices (x axis), with routine R2 starting a little after R1 (different lines). Y axis shows fraction of end states that are not serialized (i.e., all OFF, or all ON). Experiments with TP-Link smart devices [71].

Challenges: This discussion points to the need for a smart home to autonomically provide two critical properties: i) Atomicity and ii) Isolation/Serializability. Atomicity ensures that all the commands in a routine have an effect on the environment, or none of its commands do (e.g., if the window is not closed, the AC should not be turned on). Serializability says that the effect of a concurrent set of routines is equivalent to executing them one by one, in some sequential order, e.g., when and complete successfully, doors are locked, garage is closed, lights are off, trash can is in the driveway, and no equipment is damaged.

Specifying and satisfying these two properties in smart homes needs us to tackle certain unique challenges. The first challenge comes from the human-facing nature of the environment. Every action of a routine may be immediately visible to one or more human users—we use the word “visible” to capture any action that could be sensed by any human user anywhere in the smart home. This requires us to clearly specify and reason about visibility models for concurrent routines. Visibility models provide notions of serial equivalence (i.e., serializability) of routines in a smart home.

Second, a smart home needs to optimize user-facing metrics—latency to start the routine, and also latency to execute it. This motivates us to explore a new spectrum of visibility models which trade off the amount of incongruence the user sees during execution vs. the user-perceived latency, all while guaranteeing serial-equivalence of the overall execution. Our visibility models are a counterpart to the rich legacy of weak consistency models that have been explored in mobile systems like Coda [40], databases like Bayou [66] and NoSQL [74], and shared memory multiprocessors [1].

Third, in a smart home, device crashes and restarts are the norms—any device can fail at any time, and possibly recover later. These failure/recovery events may occur during a command, before a command starts, or after a command has completed. Thus, reasoning about device failure/restart events while ensuring atomicity+visibility models is a new challenge. Today’s failure handling is either silent or places the burden of resolution on the user.

Fourth, long-running (or just long) routines are common in smart homes. A long routine is one that contains at least one long command. A long command exclusively needs to control a device for an extended period, without interruption. Examples include a command to preheat an oven to , or to run north garden sprinklers for 15 minutes. Long commands cannot be treated merely as two short commands, as this would still allow the device to be interrupted by a concurrent routine in the interim, violating isolation. Long commands need to be treated as first-class commands.

Prior Work: These challenges have been addressed only piecemeal in literature. Some systems [24, 51] use priority-based approaches to address concurrent device access. Others [8] propose mechanisms to handle failures. A few systems [9, 46, 42] formally verify procedures. Transactuation [54] and APEX [80] discuss atomicity and isolation, but their concrete techniques deal with routine dependencies and do not consider users’ experience—nevertheless, their mechanisms can be used orthogonally with SafeHome. None of the above address atomicity, failures, and visibility together.

The reader may also notice parallels between our work and the ACID properties (Atomicity, Consistency, Isolation, and Durability) provided by transactional databases [49]. While other systems like TinyDB [43] have drawn parallels between networks of sensors and databases (DBs), the techniques for providing ACID in databases do not translate easily to smart homes. The primary reasons are: i) our need to optimize latency (DBs optimize throughput); ii) device failure (DB objects are replicated, but devices are not, by default); and iii) the presence of long-running routines.

Contributions: We present SafeHome, a management system that provides atomicity and isolation among concurrent routines in a smart environment. For concreteness, we focus the design of SafeHome on smart homes (however, our evaluations look at broader scenarios). SafeHome is intended to run at an edge device in the smart home, e.g., a home hub or an enhanced access point. SafeHome does not require additional logic on devices; instead, it works directly with the APIs which devices naturally provide (commands are API calls). SafeHome can thus work in a smart home containing devices from multiple vendors.

The primary contributions of this paper are:

  1. A new spectrum of Visibility Models trading off responsiveness vs. temporary congruence of smart home state.

  2. Design and implementation of the SafeHome system.

  3. A new way to reason about failures by serializing failure events and restart events into the serially-equivalent order of routines.

  4. New lock leasing techniques to increase concurrency among routines, while guaranteeing isolation.

  5. Workload-driven experiments to evaluate new visibility models and characterize tradeoffs.

SafeHome is best seen as the first step towards a grand challenge. A true OS for smart homes requires tackling myriad problems well beyond what SafeHome currently does. These include support for [2]: users to inject signals/interrupts/exceptions, safety property specification and satisfaction, leveraging programming language and verification techniques, and in general full ACID-like properties. SafeHome is an important building block over which (we believe) these other important problems can then be addressed.

2 Visibility and Atomicity

We first define SafeHome’s two key properties–Visibility and Atomicity–and then expand on each.

  • SafeHome-Visibility/Serializability: For simplicity, in this initial part of the discussion we ignore failures, i.e., we assume devices are always up and responsive. SafeHome-Visibility/Serializability means the effect of the concurrent execution of a set of routines, is identical to an equivalent world where the same routines all executed serially, in some order. The interpretation of effect determines different flavors of visibility, e.g., identicality at every point of time, or in the end-state (after all routines complete), or at critical points in the execution. These choices determine the spectrum of visibility/serializability models that we will discuss soon.

  • SafeHome-Atomicity: After a routine has started, either all its commands have the desired effect on the smart home (i.e., routine completes), or the system aborts the routine, resulting in a rollback of its commands, and gives the user feedback.

2.1 New Visibility Models in SafeHome

SafeHome presents to the user family a choice in how the effects of concurrent routines are visible. We use the term “visibility” to capture all senses via which a human user, anywhere in the environment, may experience immediate activity of a device, i.e., sight, sound, smell, touch, and taste. Visibility models that are more strict run routines sequentially, and thus may suffer from longer end-to-end latencies between initiating a routine and its completion (henceforth we refer to this simply as latency). Models with weaker visibility offer shorter latencies, but need careful design to ensure the end state of the smart home is congruent (correct).

Today’s default approach is to execute routines’ commands as they arrive, as quickly as possible, without paying attention to serialization or visibility. We call this status quo model as the Weak Visibility (WV) model, and its incongruent end states worsen quickly with scale and concurrency (see Fig. 1). We introduce three new visibility models.

1. Global Strict Visibility (GSV): In this strong visibility model, the smart home executes at most one routine at any time. In our SafeHome-Visibility definition (Sec. 2), the effect for GSV is “at every point of time”, i.e., every individual action on every device. Consider a 2-family home where one user starts a routine ={dishwasher:ON; /*run dishwasher for 40 mins*/ dishwasher:OFF;}, and another user simultaneously starts a second routine = {dryer:ON; /*run dryer for 20 mins*/ dryer:OFF;}. If the home has low amperage, switching on both dishwasher and dryer simultaneously may cause an outage (even though these 2 routines touch disjoint devices). If the home chooses GSV, then the execution of and are serialized, allowing at most one to execute at any point of time. Because routines need to wait until the smart home is “free”, GSV results in very long latencies to start routines. In GSV, a long-running routine also starves other routines.

2. Partitioned Strict Visibility (PSV): PSV is a weakened version of GSV that allows concurrent execution of non-conflicting routines, but limits conflicting routines to execute serially. For instance, for our earlier (GSV) example of and started simultaneously, if the home has no amperage restrictions, the users should choose PSV–this allows the two routines to run concurrently, and the end state of the home is (serially-) equivalent to the end state if the routines were instead to have been run sequentially (i.e., dishes washed, clothes dried). However, if the two routines were to touch conflicting devices, PSV would execute them serially.

3. Eventual Visibility (EV): This is our most relaxed visibility model which specifies that only when all the routines have finished (completed/aborted), the end state of the smart home is identical to that obtained if all routines were to have been serially executed in some sequential (total) order. In the definition of SafeHome-Visibility, the effect for EV is the end-state of the smart home after all the routines are finished.

EV is intended for the relatively-common scenarios where the desired final outcome (of routines) is more important to the users than the ephemerally-visible intermediate states. Unlike GSV, the EV model allows conflicting routines (touching conflicting devices) to execute concurrently–and thus reduces the latencies of both starting and running routines.

Consider two users in a home simultaneously initiating the same routine = { coffee:ON; /*make coffee for 4 mins*/; coffee:OFF; pancake:ON; /*make pancakes for 5 mins*/; pancake:OFF; }.

Both GSV and PSV would serially execute these routines because of the conflicting devices. EV would be able to pipeline them, overlapping the pancake command of one routine with the coffee command of the other routine. EV only cares that at the end both users have their respective coffees and pancakes.

Common Example – 3 Visibility Models: Fig. 2 shows an example with 5 concurrent routines executed for our three visibility models. This is the outcome of a real run of SafeHome running on a Raspberry Pi, over 5 devices connected via TP-Link HS-105 smart-plugs [69]. The routines are:

: makeCoffee(Espresso); makePancake(Vanilla);
: makeCoffee(Americano); makePancake(Strawberry);
: makePancake(Regular);
: startRoomba(Living room); startMopping(Living room);
: startMopping(Kitchen);

Figure 2: Example routine execution in different visibility models: a) GSV b) PSV, c) EV. represents the command of the routine. In EV, red boxes show a pair of incongruent commands and the blue box shows the total number of temporary incongruences.

GSV takes the longest execution time of 8 time units as it serializes execution. PSV reduces execution time to 5 time units by parallelizing unrelated commands, e.g., ’s coffee command and ’s Roomba command at time . EV is the fastest, finishing all routines by 3 time units. Average latencies (wait to start, wait to finish) are also fastest in EV, then PSV, then GSV. The figure shows that EV exhibits “temporary incongruence”–routines whose intermediate state is not serially equivalent. EV guarantees a temporary incongruence of zero when the last routine finishes.

Table 1 contrasts the properties of the four visibility models. Table 2 summarizes the examples discussed so far.

Concurrency At most one routine Non-conflicting routines concurrent Any serializable routines concurrent Any routines concurrent
End State Serializable Serializable Serializable Arbitrary
Wait Time: time to start routine High High for conflicting routines, low for non-conflicting routines Low for all routines (modulo conflicts) Low for all routines
Congruent at all times Congruent at end, and at start/complete points of routines Congruent at end May be incongruent at anytime or end (Fig. 1)
Table 1: Spectrum of Visibility Models in SafeHome.
Example Routines Scenario and Possible Behavior SafeHome Feature
“cooling”={window:CLOSE; AC:ON;} If executed partially, can leave window open and AC on (wasting energy) or the window closed and AC off (overheating home). Atomicity
“make coffee”= {coffee:ON; /*make coffee for 4 mins*/ ; coffee:OFF;} Coffee maker should not be interrupted by another routine. E.g, user-1 invokes make coffee, and in the middle, user-2 independently invokes make coffee. Long running routines & mutually exclusive access
={dishwasher:ON; (dishwasher runs for 40 mins); dishwasher:OFF;}
={dryer:ON; (dryer runs for 20 mins); dryer:OFF;}
If home has low amperage, simultaneously running two power-hungry devices may cause outage (GSV). Global Strict Visibility (GSV)
={coffee:ON; /*make coffee for 4 mins*/; coffee:OFF;}
={lights:ON, fan:ON}
Two routines touching disjoint devices should not block each other (PSV). Partitioned Strict Visibility (PSV), closest to [54]
“breakfast”={coffee:ON; /*make coffee for 4 mins*/; coffee:OFF, pancake:ON; /*make pancakes for 5 mins*/; pancake:OFF; } Two users can invoke this same routine simultaneously. The two routines can be pipelined thus allowing some concurrency without affecting correctness (EV). (Both GSV and PSV would have serialized them.) Eventual Visibility (EV)
“leave home”={lights:OFF (Best-Effort); door:LOCK;} Requiring all commands to finish too stringent, so only second command is Must (required). If light unresponsive, door must lock, otherwise routine aborts. Must and Best-Effort commands
‘‘manufacturing pipeline’’ with k stages and {} routines If any stage fails, entire pipeline must stop immediately. Strong GSV serialization (S-GSV) Failure Serialization
“cooling”={window:CLOSE; AC:ON;} If anytime during the routine (from start to finish), the AC fails or window fails, the routine is aborted. Loose GSV serialization (GSV)
“cooling”={window:CLOSE; AC:ON;} If window fails after its command and remains failed at finish point of routine, routine is aborted. PSV serialization
“cooling”={window:CLOSE; AC:ON;} If window fails after it is closed (but before AC is accessed), routine completes successfully–window failure can be serialized after routine. EV serialization
Table 2: Example scenarios in a smart home, and SafeHome’s corresponding features.

2.2 SafeHome-Atomicity

SafeHome-Atomicity states that after a routine has started, either all its commands have the desired effect on the smart home (i.e., routine completes), or the system aborts the routine, resulting in a rollback of its commands, and gives the user feedback. Due to the physical effects of smart home routines, we discuss three deviations from traditional notions of atomicity.

First, we allow the user to tag some commands as best-effort, i.e., optional. A routine is allowed to complete successfully even if any best-effort commands fail. Other commands, tagged as must, are required for routine completion—if any must command fails, the routine must abort. This tagging acknowledges the fact that users may not consider all commands in a routine to be equally important. For instance, a “leave-home-for-work” routine may contain commands which lock the door (must commands) and turn off lights (best-effort commands)—even if the lights are unresponsive, the doors must still lock. The user receives feedback about the failed best-effort commands, and she is free to either ignore or re-execute them.

Second, aborting a routine requires undoing past-executed commands. Many commands can be rolled back cleanly, e.g., command turn Light-3 ON can be undone by SafeHome issuing a command setting Light-3 to OFF. A small fraction of commands is impossible to physically undo, e.g., run north sprinklers for 15 mins, or blare a test alarm. For such commands, we undo by restoring the device to its state before the aborted routine (e.g., set the sprinkler/alarm state to OFF). Alternately, a user-specified undo-handler can be used.

Finally, we note that when a routine aborts, SafeHome provides feedback to the user (including logs), and she is free to either re-initiate the routine or ignore the failed routine.

3 Failure Handling and Visibility Models

Smart home devices could fail or become unresponsive, and then later restart. SafeHome needs to reason cleanly about failures or restarts that occur during the execution of concurrent routines. We only consider fail-stop and fail-recovery models of failures in the smart home (Byzantine failures are beyond our scope).

Because device failure events and restart events are visible to human users, our visibility models need to be amended. Consider a device which routine touches via one or more commands. might fail during a command from , or after its last command from , or before its first command from , or in between two commands from . A naive approach may be to abort routine in all these cases. However, for some relaxed visibility models like Eventual Visibility, if the failure event occurred anytime after completing the device’s last command from , then the event could be serialized to occur after the routine in the serially-equivalent order (likewise for a failure/restart before the first command to that device from , which can be serialized to occur before ).

Thus a key realization in SafeHome is that we need to serialize failure events and restart events alongside routines themselves. We can now restate the SafeHome-Visibility property from Sec. 2, to account for failures and restarts:

SafeHome-Visibility/Serializability (with Failures and Restarts): The effect of the concurrent execution of a set of routines, occurring along with concurrent device failure events and device restart events, is identical to an equivalent world where the same routines, device failure events, and device restart events, all occur sequentially, in some order 222This idea has analogues to distributed systems abstractions such as view/virtual synchrony, wherein failures and multicasts are totally ordered [15]. We do not execute multicasts in the smart home.

First, we define the failure/restart event to be the event when the edge device (running SafeHome) detects the failure/restart (this may be different from the actual time of failure/restart). Second, failure events and restart events must appear in the final serialized order. On the contrary, routines may appear in the final serialized order (if they complete), or not appear (if they abort). We next reason explicitly about failure serialization for each of our visibility models from Sec. 2.1.

1. Failure Serialization in Weak Visibility: Today’s Weak Visibility has no failure serialization. Routines affected by failures/restarts complete and cause incongruent end-states.

2. Failure Serialization in Global Strict Visibility: Because GSV intends to present the picture of a single serialized home to the user, if any device failure event or restart event were to occur while a routine is executing (between its start and finish), the routine must be aborted. There are two sub-flavors herein: (A) Basic GSV or Loose GSV (GSV): Routine aborts only if it contains at least one command that touches failed/restarted device; (B) Strong GSV (S-GSV): Routine aborts even if it does not have a command that touches failed/restarted device. A routine on living room shades can complete, if master bathroom shades fail, in GSV but not S-GSV. In S-GSV, the final serialization order contains the failure/restart event but not the aborted routine . In GSV, the final serialization order contains both (which completes) and the failure/restart event, in arbitrary order.

3. Failure Serialization in Eventual Visibility: For a given set of routines (and concurrent failure events and restart events), the eventual (final) state of the actual execution is equivalent to the end state of a world wherein the final successful routines, failure device events, and failure restart events, all occurred in some serial order.

Consider routine , and the failure event (and potential restart event) of one device . Four cases arise:

  1. If is not touched by , then ’s failure event and/or restart event can be arbitrarily ordered w.r.t. .

  2. If ’s failure and restart events both occur before first touches the device, then the failure and restart events are serialized before .

  3. If ’s failure event occurs after the last touch of by , then ’s failure event (and eventual restart event) are serialized after .

  4. In all other cases, routine aborts due to ’s failure. does not appear in the final serialized order.

These are applicable to each concurrent routine accessing .

4. Failure Serialization in Partitioned Strict Visibility: This is a modified version of EV where we change condition 3 (from 1-4 in EV above) to the following:

3*. If ’s failure event occurs after the last touch of by , and has recovered when reaches its finish point, then ’s failure event and restart event are serialized right after .

Example—Effect of Failure on Three Visibility Models: Consider the routine from Section 1, := {CLOSE window; switch ON AC;}. Suppose the “window” device fails concurrently with the routine (between its start and finish times). GSV always aborts regardless of when the window failed. PSV aborts only if the window remains failed at ’s finish point. EV does not need to abort if window fails any time after ’s first command has completed successfully, even if window remains failed at ’s finish time. EV places the window failure event after in the serialization order, and the smart home’s end state is equivalent. If the window fails and restarts before ’s first command, EV serializes the failure and restart before , and executes correctly. Thus EV has the least chance of aborting a routine due to a failure.

Table 2 summarizes all our examples so far and Fig. 3 summarizes our failure handling rules.

Figure 3: Failure Serialization: 6 cases, and their handling in Visibility Models. - execute routine, X - abort routine. At F[A] /Re[A] the edge device detects the failure/restart (resp.) of device A.

4 Eventual Visibility: SafeHome Design

In order to maintain correctness for Eventual Visibility (i.e., serial-equivalence), SafeHome requires routines to lock devices before accessing them. Because long routines can hold locks and block short routines, we introduce lock leasing across routines (Sec. 4.1). This information is stored in the Locking Data-structure (Sec. 4.2). The lineage table ensures invariants required to guarantee Eventual Visibility (Sec. 4.3).

4.1 Locks and Leasing

SafeHome prefers Pessimistic Concurrency Control (PCC): SafeHome adopts pessimistic concurrency control among routines, via (virtual) locking of devices. Abort and undo of routines are disruptive to the human experience, causing (at routine commit point) rollbacks of device states across the smart home. Our goal is to minimize abort/undo only to situations with device failures, and avoid aborts because routines touch conflicting devices. Hence we eschew optimistic concurrency control approaches and use locking 333For the limited scenarios where routines are known to be conflict-free, optimistic approaches may be worth exploring in future work..

SafeHome uses virtual locking wherein each device has a virtual lock (maintained at the edge device running SafeHome), which must be acquired by a routine before it can execute any command on that device. A routine’s lock acquisition and release do not require device access, and are not blocked by device failure/restart.

In order to prevent a routine from aborting midway because it is unable to acquire a lock, SafeHome uses early lock acquisition—a routine acquires, at its start point, the locks of all the devices it wishes to touch. If any of these acquisitions fails, the routine releases all its locks immediately and retries lock acquisition. Otherwise, acquired locks are released (by default) only when the routine finishes.

Leasing of Locks: To minimize chances of a routine being unable to start because of locks held by other routines, SafeHome allows routines to lease locks to each other. Two cases arise: 1) routine holds the lock of device for an extended period before ’s first access of , and 2) holds the lock of device for an extended period after ’s last access of . Both cases prevent a concurrent routine , which also wishes to access , from starting.

SafeHome allows a routine holding a lock (on device ) to lease the lock to another routine . When is done with its last command on , the lock is returned back to , which can then normally use it and release it. We support two types of lock leasing:

  • Pre-Lease: has started but has not yet accessed . A lease at this point to is called a pre-lease, and places ahead of in the serialization order. After ’s last access of , it returns the lock to . If reaches its first access of before the lock is returned to it, waits. After the lease ends, can use the lock normally.

  • Post-Lease: is done accessing device , but the routine itself has not finished yet. A lease at this point to is called a post-lease, and places after in the serialization order. If finishes before , the lock ownership is permanently transferred to . Otherwise, returns the lock when it finishes.

A prospective pre/post-lease is disallowed if a previous action (e.g., another lease) has already determined a serialization order between and that would be contradicted by this prospective lease. In such cases needs to wait until ’s normal lock release. Further, a post-lease is not allowed if at least one device is written by and then read by . This prevents SafeHome from suffering dirty reads from aborted routines. We prevent scenarios like this– switches on a light, and has a conditional clause based on that light’s status, but subsequently aborts. Cascading aborts are handled in [54], whose techniques can be used orthogonally with ours.

To prevent starvation, i.e., from

waiting indefinitely for the returned lock, leased locks are revoked after a timeout. The timeout is calculated based on the estimated time between

’s first and last actions on , multiplied by a leniency factor (we use ). Lock revocation before ’s last access of causes to abort.

4.2 Locking Data-structure

Figure 4: SafeHome’s Architecture for Eventual Visibility.

SafeHome adopts a state machine approach [57] to track current device states, future planned actions by routines, and a serialization order. SafeHome maintains, at the edge device (e.g., Home Hub or smart access point), a virtual locking table data-structure (Fig. 4). This contains:

  • Wait Queue: Queue of routines initiated but not started. When a routine is added, it is assigned an incremented routine ID.

  • Serialization Order: Maintains the current serialization order of routines, failure events, and restart events. For completed routines (shaded green), the order is finalized. All other orders are tentative and may change, e.g., based on lock leases. Failure and restart events may be moved flexibly among unfinished routines.

  • Lineage Table: Detailed in Section 4.3, this maintains, for each device, a lineage: the planned transition order of that device’s lock.

  • Scheduler: Decides when routines from Wait Queue are started, acquires locks, and maintains serialization order.

  • Committed States: For each device, keeps its last committed state, i.e., the effect of the last successfully routine. This may be different from device’s actual state, and is needed to ensure serialization and rollbacks under aborts.

4.3 Lineage Table

Figure 5: Sample Lineage Table, with 6 routines. Some fields are omitted for simplicity.

The lineage of a device represents a temporal plan of when the device will be acquired by concerned routines. The lineage of a device starts with its latest committed state, followed by a sequence of lock-access entries (Fig. 5)–these are “stretched” horizontally. A width of a lock-access entry represents how long that routine will acquire that lock. Each lock-access entry for device consists of: i. A routine ID, ii. Lock status (Released, Acquired, Scheduled) iii. Desired device state by the command (e.g., ON/OFF) and iv. Times: a start time (), and duration () of the lock-access.

In the example of Fig. 5, a Scheduled [S] status indicates that the routine is scheduled to access the lock. An Acquired [A] status shows it is holding and using the lock. A Released [R] status means the routine has released the lock.

The duration field, , is set either based on known time to run a long command (e.g., run sprinkler for 15 mins), or an estimate of the command execution time. Our implementation uses a fixed for all short commands (100ms based on our experience). is also used to determine the revocation timeout for leased locks, along with a multiplicative leniency factor (1.1 in our implementation).

To maintain serializability, four key invariants are assured:

Invariant 1 (Future Mutual Exclusion: Lock-accesses in a device’s lineage list do not overlap in time):

No device is planned to be locked by multiple routines. Gaps in its lineage list indicate times the device is free.

Invariant 2 (Present Mutual Exclusion: At most one Acquired lock-access exists in each lineage list):

No device is locked currently by multiple routines.

Invariant 3 (Lock-access [R][A][S]):

In each lineage list, all Released lock-access entries occur to the left of (i.e., before) any Acquired entries, which in turn appear to the left of any Scheduled entries.

Invariant 4 (Consistent “serialize-before” ordering among lineages):

Given two routines , if there is at least one device such that: lock-access occurs to the left of lock-access in ’s lineage list, then for every other device touched by both , it is true that: lock-access occurs to the left of lock-access. Hence is serialized-before .

Transition of Lock-accesses: The status of lock-accesses changes upon certain events. First, when a routine’s last access to a device ends, the Acquired lock-access ends, and transitions to Released. The next Scheduled lock-access turns to Acquired: i) either immediately (if no gap exists, e.g., after releases in Fig. 5), or ii) after the gap has passed, e.g., after releases in Fig. 5.

Second, when scheduling a new routine (from the wait queue), a Scheduled lock-access entry is added to all device lineages that needs (e.g., in Fig. 5 adds lock-accesses for B and C). Third, when a routine finishes (completes/aborts), all its lock-access entries are removed, releasing said locks. If the routine completed successfully, committed states are updated. For an abort, device states are rolled back.

Figure 6: Lineage table with Lock Leasing. a) Lineage before leasing with only , b) Pre-lease to that only accesses device B, and c) Post-lease to that only accesses device A.

Leasing of Locks: Consider a pre-lease from to (Fig. 6(b)). First, a new Acquired lock-access for is placed before (to the left of) the lock-access of in the lineage table. Second, the lock-access of is changed to “Leased ()” status.

Figure 6(c) shows a post-lease: a new Acquired lock-access of is placed after (to the right of) the lock-access of and the lock-access of changes to Released.

Aborts and Rollbacks: For an aborted routine , we roll back states of only those devices in whose lineage appeared. For a device , there are two cases:

  • Device was last Acquired by routine (): We remove ’s lock-access from ’s lineage. This captures two possibilities: a) never executed actions on (e.g., Fig. 5: device C when aborting ), or b) leased to another routine , and since is aborting, ’s effect will be the latest (e.g., Fig. 5: device A when aborting ).

  • Device was last Acquired by routine (e.g. device C when aborting in Fig. 5): We: 1) remove the ’s lock-access from ’s lineage, and 2) issue a command to set ’s status to ’s immediately left/previous lock-access entry in the lineage (if none exist, use Committed State), unless the device is already in this desired state.

Committing (Successfully Completing) a routine: When a routine reaches its finish point, it commits (completes successfully) by: i) updating Committed States, and ii) removing its lock-access entries. might appear after in the serialization order but complete earlier, e.g., due to lock leasing. SafeHome allows such routines to commit right away by using commit compaction–routines later in the serialization order will overwrite effects of earlier routines (on conflicting devices). This is similar to “last writer wins” in NoSQL DBs [74]. Concretely, for all common devices we remove both ’s lock-access, and all lock-accesses before it (Fig. 7).

(a) Before commit
(b) After commits
Figure 7: Commit with compaction.

Current Device Status: A device’s current status is needed at several points, e.g., abort. Due to uncompleted routines, the actual status may differ from the committed state. The lineage table suffices to estimate a device’s current state (without querying the device). Fig. 8 shows the three different cases. (a) If an Acquired lock-access entry exists, use it (e.g., in Fig. 8(a) with ). (b) Otherwise, if lock-accesses exist with lock status Released, use the right-most entry (e.g., in Fig. 8(b) with ). (c) Otherwise, use the Committed State entry (e.g., committed state in Fig. 8(c)).

Figure 8: Inferring the current device status. The dashed boxes point to the current device status in three different scenarios.

5 Scheduling Policies for Eventual Visibility

When a new routine arrives, SafeHome needs to “place” it in the serialization order, adhering to invariants of Sec. 4.3. This is the scheduling problem. We present three alternatives.

First Come First Serve (FCFS) Scheduling: Routines are serialized in order of arrival. When a routine arrives, its lock-access entries are appended to the lineage table. FCFS avoids pre-leases as they would violate serialization order. Post-leases are allowed.

FCFS is attractive if a user expects routines to execute in the order they were initiated. However, FCFS prolongs time between routine submission and start.

Just-in-Time (JiT) scheduling: JiT greedily places a new routine at the earliest position (in the lineage) when it is eligible to start. JiT triggers an eligibility test upon either: (i) each routine arrival, or (ii) on every lock release. The eligibility test greedily checks for routine if it can now acquire all its locks, either right away, or via pre-leases or post-leases. For case (ii) we run the eligibility test only on those waiting routines that desire the released device. To mitigate starvation, we use a per-routine TTL (Time To Live)—when a waiting routine ’s TTL expires, is prioritized to start next (ties broken by arrival order).

Timeline(TL) Scheduling: This flexible policy uses estimates of lock-access durations, and speculatively places waiting routines into the lineage table based on these estimates. This means no routines need to wait (for an eligibility test) before being added to the lineage table. TL scheduling tries to place routines in the gaps in the lineage table without violating the lineage table invariants (Section 4.3). An example is shown in Figures 8(a),  8(b). Figure 8(c) shows that TL may “stretch” a routine’s execution time due to lock waits during execution. To mitigate this, a new routine is delayed from starting (now) if this were to cause TL to stretch some running routine beyond a pre-specified threshold.

Figure 9: Timeline Scheduler (TL) example a) before scheduling b) trying a potential (but invalid) schedule, c) scheduling at the first possible gap.
1:function Schedule(, index, startTime, preSet, postSet)
2:     devID =
3:     duration =
4:     //return from recursion
5:     if  < index then
6:          return true
7:     end if
8:     //Find gap and pre- and post-set
9:     gap = getGap(devID , startTime, duration)
10:     curPreSet = preSet getPreSet(lineage[devID], gap.id)
11:     curPostSet = postSet getPostSet(lineage[devID], gap.id)
12:     if curPreSet curPostSet =  then
13:          //Serialization is not violated
14:          canSchedule = schedule(, index + 1, gap.startTime + duration , curPreSet, curPostSet)
15:          if canSchedule then
16:                lineage[devID].insert(, gap)
17:               return true
18:          end if
19:     end if
20:     //backtrack: try next gap
21:     return schedule(, index, gap.startTime + duration , preSet, postSet)
22:end function
Algorithm 1 Timeline scheduling of routine

TL scheduling uses a backtrack-based search strategy to find the best placement for a new routine in the lineage table. Algo. 1 shows the pseudocode. We explain via an example. Fig. 8(a) depicts a lock table right before routine arrives at time , and has four gaps in the lineage. Starting with the first device in the routine ( for ): (Line 3), the Timeline scheduler finds the first gap in ’s lineage that can fit (Line 9). This is Gap 1 in Fig. 8(a). Next, the Timeline scheduler validates that this gap choice will not violate previously decided serializations. For the scheduled lock-accesses of so far, it builds two sets: a) preSet: the union of all (executing and scheduled) routines placed before ’s lock-accesses ( in Fig. 8(b)), and b) postSet: the union of all (executing and scheduled) routines placed after ’s lock-accesses ( in Fig. 8(b)). The preSet and postSet of represent the routines positioned before and after , respectively, in the serialization order. The gap choice is valid if and only if the intersection of the preSet and the postSet is empty. If true, the scheduler moves on to the next command of the routine. Otherwise (Fig. 8(b)), the scheduler backtracks and tries the next gap (Line 21). The process repeats.

6 SafeHome Implementation

(a) JSON representation of SafeHome routine (part)
(b) G. Home routine [30] (c) TP-Link routine [72]
Figure 10: Defining a routine “Prepare Breakfast” Two commands: i)Turn ON Coffee Maker and ii) Turn ON Toaster.

We implemented SafeHome in 1200 core lines of Java. SafeHome runs on an edge device, such as a Home Hub or an enhanced/smart access point. Our edge-first approach has two major advantages: 1) SafeHome can be run in a smart home containing devices from a diverse set of vendors, and 2) SafeHome is autonomous, without being affected by ISP/external network outages [78, 20] or cloud outages [3, 28, 29].

SafeHome works directly with the APIs exported by devices—commands in routines are programmed as API calls directly to devices. SafeHome’s routine specification is compatible with other smart home systems (Fig. 10). Our current implementation works for TP-Link smart devices [70, 71], using the HS110Git [68] device-driver. Other devices (e.g., Wemo [75]) can be supported via their device-drivers.

Fig. 11 shows our implementation architecture. When a user submits routines, they are stored in the Routine Bank, from where they can be invoked either by the user or triggers, via the Routine Dispatcher. The Concurrency Controller runs the appropriate Visibility model’s implementation. Apart from Eventual Visibility (Sec. 5), we also implemented Global Strict Visibility (GSV), and Partitioned Strict Visibility (PSV), with failure/restart serialization. Our Weak Visibility reflects today’s laissez-faire implementation.

The Failure Detector explicitly checks devices by periodically (1 sec) sending ping messages. If a device does not respond within a timeout ( ms by default), the failure detector marks it as failed. We also leverage implicit failure detection by using the last heard SafeHome TCP message as an implicit ack from the device, reducing the rate of pings.

Figure 11: SafeHome Architecture

7 Experimental Results

(a) Latency, Temporary Incongruence, and Parallelism for Three Scenarios. To identify lines we show one label symbol for each (plot has many more data points). Some GSV lines may be cut to show separation between other models.
(b) Final Incongruence. Run with 9 routines, 100 runs per scenario, and checks if final smart home state is equivalent to some serial ordering of routines (9! possibilities). Final Incongruence measures the ratio of end states that were not congruence out of 100 runs.
Figure 12: Experiment Results with Trace-Based Scenarios

We evaluate SafeHome using both workloads based on real-world deployments, and microbenchmarks. The major questions we address include:

  1. Are relaxed visibility models (like Eventual Visibility) as responsive as Weak Visibility, and as correct as Global Strict Visibility (Sec. 2.1)?

  2. What effect do failures have on correctness and user experience (Sec. 3)?

  3. Which scheduler policy (Sec. 5) is the best?

  4. What is the effect of optimizations, e.g., lock leasing, commit compaction, etc. (Sec. 4)?

7.1 Experimental Setup

We wish to evaluate SafeHome for a variety of scenarios and parameters. Hence we run our implementation over an emulation, using both real-world workloads (Sec. 7.2) and synthetic workloads (Sec. 7.3 - 7.6).


Because of the human-visible nature of SafeHome, our primary evaluation metrics are also human-visible:

End to end latency (or Latency): Time between a routine’s submission and its successful completion.

Temporary Incongruence: This metric measures how much the human user’s actual experience differs from a world where all routines were run serially. We take worst case behavior. Before a routine completes, if another routine changes the state of any device modified, we say has suffered a temporary incongruence event. The Temporary Incongruence metric measures the fraction of routines that suffer at least one such temporary incongruence event.

Final Incongruence: Final Incongruence measures the ratio of runs that end up in an incongruent state.

Parallelism level: This efficiency/utilization metric is the number of routines that are allowed by SafeHome to execute concurrently, averaged throughout the run. To avoid domination by durations when only 0 or 1 routines run, we only measure the metric at points when a routine starts/ends.

7.2 Experiments with Real-World Benchmarks

We extracted traces from three real homes (20-30 devices, multi-user families) who were using Google Home, over 2 years. We also studied two public datasets: 1) 147 SmartThings applications [63]; and 2) IoTBench: 35 OpenHAB applications [39]. Based on these, we created three representative benchmarks: (We will make these available openly.)

Morning Scenario: This chaotic scenario has 4 family members in a 3-bed 2-bath home concurrently initiating 29 routines over 25 minutes touching 31 devices. Each user starts with a wake-up routine and ends with the leaving home routine. In between, routines cover bedroom bathroom use, breakfast cook + eat, and sporadic routines, e.g., milk spillage cleanup.

Party Scenario: Modeling a small party, it includes one long routine controlling the party atmosphere for the entire run, along with 11 other routines covering spontaneous events, e.g., singing time, announcements, serving food/drinks, etc.

Factory Scenario:

This is an assembly line with 50 workers at 50 stages. Each stage has access to local devices, to some devices shared with immediately preceding and succeeding stages, and to 5 global devices. Each stage’s routine has device access probabilities: 0.6 for local devices, 0.3 for neighbor devices, and 0.1 for global devices. Routines are generated to keep each worker occupied (no idle time).

We trigger routines at random times while obeying preset constraints capturing real-life logic, e.g., “wake-up” routine before “cook breakfast” routine. In the morning scenario, each routine occurs once per run, and for the factory scenario routines are probabilistically generated (with possible repetition). We run 1000 trials to obtain each datapoint.

Results: From Fig. 11(a) (top row), in the morning scenario: 1) EV’s latency is comparable to WV at both median and percentile, and 2) PSV has 15% worse percentile latency than EV. Generally, the higher the parallelism level (last column), the lower the latency. For instance, EV has a median parallelism level 3 higher than GSV, and median latency 16 better than GSV. Parallelism creates more temporary incongruences (middle column of figure). This is expected for EV. Yet, EV’s (and GSV’s) end state is serially equivalent while WV may end incongruently–this is shown in Fig. 11(b). Thus EV offers similar latencies as, but better final congruence than, WV. Only if the user cares about temporary incongruence is PSV preferable.

In Fig. 11(a) (middle row), the party scenario shows similar trends to the morning scenario with one notable exception. PSV’s benefit is lower, with only 11% 90th percentile latency reduction from GSV (vs. 77% in morning). This occurs because the single long routine blocks other routines. EV avoids this head-of-line blocking because of its pre- and post-leasing.

In Fig. 11(a) (bottom row), the factory scenario shows similar trends to morning scenario, except that: (i) EV’s median latency is 23.1% worse than WV, and (ii) the parallelism level is higher in EV than WV. This is due to the back-to-back arrival of multiple routines. WV executes them as-is. However, EV may delay some routines (due to device conflicts)–when the conflict lifts, all eligible routines run simultaneously, increasing our parallelism level and latency.

7.3 Workload-Driven Emulation: Parameters

The rest of this section performs workload-driven experiments. Table 3 summarizes the parameters used. By default we run 100 routines, 25 devices, and an average of 3 commands per routine. Each routine has a probability of being long-running. We run 1M trials to obtain each datapoint.

Name default Description
Total number of routines
Number of concurrent routines injected
Average commands per routine (ND)
Zipfian coefficient of device popularity
% Percentage of long running routines
min. Average duration of a long running command (ND)
sec. Average duration of a short running command (ND)
% Percentage of “Must” commands of a routine
% Percentage of the failed devices
Table 3: Parameterized Microbenchmark: Summary of Parameters.

ND = Normal distribution.

7.4 Atomicity Evaluation: Effect of Failures

Failures abort more routines in EV because it allows high concurrency, yet EV’s intrusive effect on the user (due to aborts) is the lowest of all visibility models. Fig. 12(a) and 12(b) measure the fraction of routines aborted due to a failure. We induce fail-stop failures, where 25% of the total devices were marked as failed at a random point during the run. Yet Fig. 12(c) and 12(d) show that the rollback overhead of EV is smallest among all visibility models–this is the average fraction of commands rolled back, across aborted routines. PSV’s rollback overhead is higher than EV as it aborts more at the routine’s finish point (when checking up/down status of devices touched). EV aborts affected routines earlier rather than later. GSV and S-GSV have low abort rates because of their serial execution but have higher rollback overheads than EV. Thus, even when execution is serial, the effect of failures can be more intrusive on the human. We conclude that EV is the least intrusive model.

(a) Must Vs Abort Rate
(b) Failure Vs Abort Rate
(c) Must Vs Rollback Overhead
(d) Failure Vs Rollback Overhead
Figure 13: Effect of Failures. Rollback Overhead = Intrusion on User. Parameters in Table 3.

The plateauing in Figures 12(a), 12(b) is due to saturation of parallelism level. The plateauing in Figs. 12(c), 12(d) is due to saturation at abort-points–for GSV at 50%, with S-GSV lower at 40% since any device failure triggers the abort.

7.5 Scheduling Policies

(a) E2E Latency
(b) Incongruence
(c) Parallelism
Figure 14: Scheduling Policies. Parameters in Table 3. (a) E2E Latency normalized with routine runtime. (b) Temporary Incongruence. (c) Parallelism Level.

Fig. 14 compares FCFS, JiT, and Timeline (TL) scheduling policies (Sec. 5). In Fig. 13(a) with concurrent routines, TL is and faster than FCFS and JiT respectively. The benefit of TL over FCFS is due to pre-leasing. The benefit of TL over JiT is due to opportunistic use of leasing. TL also has higher parallelism level (Fig. 13(c)) than FCFS ( at ) and JiT ( ).

7.5.1 Timeline-based Eventual Visibility (TL)

(a) Normalized E2E Latency
(b) Temporary Incongruence (%)
(c) CDF of Stretch Factor
(d) Algo. 1 Insertion Time
Figure 15: TL Scheduler under EV. Parameters in Table 3.

Fig. 14(a) and 14(b) show that disabling leasing reduces temporary incongruence but significantly increases latency. Turning off both pre and post leasing increases latency (from Both-on to Both-off) by between to (as concurrency level and commands per routine are varied). Post-leases are more effective than pre-leases: disabling the former raises latency by between to , while disabling the latter raises latency from between to . Post-leasing opportunities are more frequent than pre-leasing ones because the former does not require changing the serialization order (the latter does). These trends are true for all combinations of .

TL might also “stretch” routines (Fig. 8(c)). Fig. 14(c) shows stretch factor, measured as the time between a routine’s actual start (not submission) and actual finish, divided by the ideal (minimum) time to run the routine. With routine size, stretch factor rises at first (at only 5% routines have stretch , vs. 25% at ) but then drops ( at ). Essentially the lock-table saturates beyond a , creating fewer gaps and forcing EV to append new routines to the schedule.

We used a Raspberry Pi 3 B+ [50] to run TL as the home hub (15 devices, 30 routines). Fig. 14(d) shows it takes only 1 ms to schedule a large routine with 10 commands. Surveys show typical routines today contain 5 commands or fewer [63, 39], hence our scheduler is fast in practice.

7.6 Parameterized Microbenchmark Experiments

(a) End to End latency
(b) Parallelism level (%)
(c) Temporary Incongruence & Order Mismatch (%)
(d) Device Popularity vs. Latency
Figure 16: Impact of Routine size () and device popularity (). PSV and GSV are always zero and omitted in (c).

Commands per routine : Fig. 15(a), 15(b) show GSV’s latency rises as routines contain more commands. With smaller routines, PSV is close to EV and WV, but as routines contain more commands, PSV quickly approaches GSV. While EV has a similar trend, it stays faster than GSV and PSV. Parallelism level and temporary incongruence follow this trend. Finally, EV’s peaking behavior and eventual convergence towards GSV (Fig. 15(c)) occur since beyond a certain routine size (=4), pre/post-leasing opportunities decrease.

Device popularity : Using a Zipf distribution for device access by routines, Fig. 15(d) shows that increasing

(popularity skew) causes EV’s latency to stay close to WV. More conflict slows PSV quickly down to GSV.

Long running routines: As the long running routine length rises (Fig. 16(a)), temporary incongruences decrease since the run is now longer, routines are spread temporally, and less likely to conflict. Increasing the number of long running routines () increases the chance of conflict, causing more temporary incongruence. (Fig. 16(b)). The order mismatch—how much the final serialization order differs from the submission order of routines, using swap distance: i) rises as routines get longer (Fig. 16(a)), ii) but falls with as more routines are longer (Fig. 16(b)), because post-leases dominate. Overall, order mismatch stays low, between 3%-10%.

Figure 17: Impact of: (a) long routine duration (), and (b) percentage of long running routines ().

8 Related Work

Support for Routines: Routines are supported by Alexa [7], Google Home [30], and others [56, 53, 4]. iRobot’s Imprint [76, 37] supports long-running routines, coordinating between a vacuum [52] and a mop [16]. All these systems only support best-effort execution (akin to WV).

Consistency in Smart Homes: SafeHome can be used orthogonally with either: i) transactuations [54], which provides a consistent soft-state, or ii) APEX [80], which ensures safety by automatically discovering and executing prerequisite commands. These two systems maintain strict isolation by sequentially executing conflicting routines, making them both somewhat akin to PSV.

Abstractions: IFTTT [36] represents the home as a set of simple conditional statements, while HomeOS [24] provides a PC-like abstraction for the home where devices are analogous to peripherals in a PC. Beam [55] optimizes resource utilization by partitioning applications across devices. These and other abstractions for smart homes [47, 79, 45, 12, 77] do not address failures or concurrency.

Concurrency Control: Concurrency control is well-studied in databases [14]. Smart Home OSs like HomeOS, SIFT, and others [24, 42, 51, 46] explore different concurrency control schemes. However, none of these explore visibility models. Classical task graph scheduling algorithms [19, 13, 41, 5, 10, 34, 35] do not tackle SafeHome’s specific scheduling problem.

ACID Properties applied in Other Domains: There is a rich history of leveraging transaction-like ACID properties in many domains. Examples include work in software-defined networks to guarantee update consistency [21, 22] and for robustness [18]. ACID has also been applied in transactional memory [33, 48, 32, 11] and pervasive computing [65].

9 Conclusion

SafeHome is: i) the first implementation of relaxed visibility models for smart homes running concurrent routines, and ii) the first system that reasons about failures alongside concurrent routines. We find that:
(1) Eventual Visibility (EV) provides the best of both worlds, with: a) user-facing responsiveness (latency) only worse than today’s Weak Visibility (WV), and b) end state congruence identical to the strongest model Global Strict Visibility (GSV).
(2) When routines abort due to failures, EV rolls back the fewest commands among all models.
(3) Lock leasing improves latency by .
(4) Compared to competing policies (FCFS and JiT), Timeline Scheduling improves latency by and parallelism by .


  • [1] S. V. Adve and K. Gharachorloo (1996-12) Shared memory consistency models: a tutorial. IEEE Computer 29 (12), pp. 66–76. External Links: ISSN 0018-9162, Link, Document Cited by: §1.
  • [2] S. B. Ahsan, R. Yang, S. A. Noghabi, and I. Gupta (2019-07) Home, SafeHome: ensuring a safe and reliable home using the edge. In 2nd USENIX Workshop on Hot Topics in Edge Computing (HotEdge 19), Renton, WA. External Links: Link Cited by: §1.
  • [3] Amazon Alexa outage. Note: https://downdetector.com/status/amazon-alexa/news/235561-problems-at-alexaLast accessed April 2020 Cited by: §6.
  • [4] Alexa routines show promise and limitations. Note: https://www.timeatlas.com/create-alexa-routines/Last accessed April 2020 Cited by: §8.
  • [5] M. Amaris, G. Lucarelli, C. Mommessin, and D. Trystram (2017) Generic algorithms for scheduling applications on hybrid multi-core machines. In Euro-Par 2017: Parallel Processing, F. F. Rivera, T. F. Pena, and J. C. Cabaleiro (Eds.), Cham, pp. 220–231. Cited by: §8.
  • [6] Amazon Alexa + SmartThings routines and scenes. Note: https://support.smartthings.com/hc/en-us/articles/210204906-Amazon-Alexa-SmartThings-Routines-and-ScenesLast accessed April 2020 Cited by: §1.
  • [7] Amazon Alexa. Note: https://developer.amazon.com/alexaLast accessed April 2020 Cited by: §8.
  • [8] M. S. Ardekani, R. P. Singh, N. Agrawal, D. B. Terry, and R. O. Suminto (2017) Rivulet: a fault-tolerant platform for Smart-home Applications. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Middleware ’17, New York, NY, USA, pp. 41–54. External Links: ISBN 978-1-4503-4720-4, Link, Document Cited by: §1.
  • [9] I. Armac, M. Kirchhof, and L. Manolescu (2006-03) Modeling and analysis of functionality in eHome systems: dynamic rule-based conflict detection. In 13th Annual IEEE International Symposium and Workshop on Engineering of Computer-Based Systems (ECBS’06), Vol. , pp. 10 pp.–228. External Links: Document, ISSN Cited by: §1.
  • [10] A.R. Arunarani, D. Manjula, and V. Sugumaran (2018-09) Task scheduling techniques in cloud computing: a literature survey. Future Generation Computer Systems 91, pp. . External Links: Document Cited by: §8.
  • [11] H. Attiya, A. Gotsman, S. Hans, and N. Rinetzky (2013) A programming language perspective on transactional memory consistency. In Proceedings of the 2013 ACM symposium on Principles of distributed computing, pp. 309–318. Cited by: §8.
  • [12] Automate.io. Note: https://automate.io/Last accessed April 2020 Cited by: §8.
  • [13] D. Barthou and E. Jeannot (2014) SPAGHETtI: scheduling/placement approach for task-graphs on heterogeneous architecture. In Euro-Par 2014 Parallel Processing, F. Silva, I. Dutra, and V. Santos Costa (Eds.), Cham, pp. 174–185. External Links: ISBN 978-3-319-09873-9 Cited by: §8.
  • [14] P. A. Bernstein, V. Hadzilacos, and N. Goodman (1986) Concurrency control and recovery in database systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. External Links: ISBN 0-201-10715-5 Cited by: §8.
  • [15] K. Birman and T. Joseph (1987) Exploiting virtual synchrony in distributed systems. In Proceedings of the 11th ACM Symposium on Operating Systems Principles, SOSP ’87, New York, NY, USA, pp. 123–138. External Links: ISBN 0-89791-242-X, Link, Document Cited by: footnote 2.
  • [16] Braava. Note: https://www.irobot.com/braavaLast accessed April 2020 Cited by: §8.
  • [17] M. Campbell-Kelly (2009-05) Historical reflections: the rise, fall, and resurrection of software as a service. Commun. ACM 52 (5), pp. 28–30. External Links: ISSN 0001-0782, Link, Document Cited by: §1.
  • [18] M. Canini, P. Kuznetsov, D. Levin, and S. Schmid (2015) A distributed and robust sdn control plane for transactional network updates. In 2015 IEEE conference on computer communications (INFOCOM), pp. 190–198. Cited by: §8.
  • [19] L. Canon, L. Marchal, B. Simon, and F. Vivien (2020) Online scheduling of task graphs on heterogeneous platforms. IEEE Transactions on Parallel and Distributed Systems 31 (3), pp. 721–732. Cited by: §8.
  • [20] Comcast outage. Note: https://webdownstatus.com/outages/
    comcastLast accessed April 2020
    Cited by: §6.
  • [21] M. Curic, G. Carle, Z. Despotovic, R. Khalili, and A. Hecker (2017) SDN on acids. In Proceedings of the 2nd Workshop on Cloud-Assisted Networking, pp. 19–24. Cited by: §8.
  • [22] M. Curic, Z. Despotovic, A. Hecker, and G. Carle (2018) Transactional network updates in sdn. In 2018 European Conference on Networks and Communications (EuCNC), pp. 203–208. Cited by: §8.
  • [23] S. Davidoff, M. K. Lee, C. Yiu, J. Zimmerman, and A. K. Dey (2006) Principles of smart home control. In Proceedings of the 8th International Conference on Ubiquitous Computing, UbiComp’06, Berlin, Heidelberg, pp. 19–34. External Links: ISBN 9783540396345, Link, Document Cited by: §1.
  • [24] C. Dixon, R. Mahajan, S. Agarwal, A. J. Brush, B. Lee, S. Saroiu, and P. Bahl (2012) An operating system for the home. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12, Berkeley, CA, USA, pp. 25–25. External Links: Link Cited by: §1, §8, §8.
  • [25] EE: average UK Smart Home will have 50 connected devices by 2023. Note: https://www.totaltele.com/500103/EE-Average-UK-Smart-Home-will-have-50-connected-devices-by-2023Last accessed April 2020 Cited by: §1.
  • [26] M. Ellis, 5 times smart home technology went wrong. Note: https://www.makeuseof.com/tag/smart-home-technology-went-wrong/Last accessed November 2019 Cited by: §1.
  • [27] Fenestra: make your windows smart. Note: http://www.smartfenestra.com/homeLast accessed April 2020 Cited by: §1.
  • [28] Google Home outage. Note: https://downdetector.com/status/google-homeLast accessed April 2020 Cited by: §6.
  • [29] SmartThings outage. Note: https://downdetector.com/status/
    smartthings/news/224625-problems-at-smartthingsLast accessed April 2020
    Cited by: §6.
  • [30] Google Home. Note: https://store.google.com/us/product/
    google_homeLast accessed April 2020
    Cited by: 9(b), §8.
  • [31] Google’s smart home ecosystem is a complete mess. Note: https://www.cnet.com/news/googles-smart-home-ecosystem-is-a-complete-mess/Last accessed April 2020 Cited by: §1.
  • [32] R. Guerraoui and M. Kapalka (2008) On the correctness of transactional memory. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 175–184. Cited by: §8.
  • [33] R. Guerraoui and M. Kapałka (2010) Principles of transactional memory. Synthesis Lectures on Distributed Computing 1 (1), pp. 1–193. Cited by: §8.
  • [34] C. Hanen and A. Munier (1995) An approximation algorithm for scheduling dependent tasks on m processors with small communication delays. In Proceedings 1995 INRIA/IEEE Symposium on Emerging Technologies and Factory Automation. ETFA’95, Vol. 1, pp. 167–189 vol.1. Cited by: §8.
  • [35] J. Hwang, Y. Chow, F. D. Anger, and C. Lee (1989-04) Scheduling precedence graphs in systems with interprocessor communication times. SIAM J. Comput. 18 (2), pp. 244–257. External Links: ISSN 0097-5397, Link, Document Cited by: §8.
  • [36] IFTTT. Note: https://ifttt.com/Last accessed April 2020 Cited by: §8.
  • [37] ImprintTM link technology: getting started. Note: https://homesupport.irobot.com/app/answers/detail/a_id/
    21090/ /imprint%E2%84%A2-link-technology%3A-getting-startedLast accessed April 2020
    Cited by: §8.
  • [38] Internet of Shit. Note: https://twitter.com/internetofshitLast accessed April 2020 Cited by: §1.
  • [39] IoTBench test-suite. Note: https://github.com/IoTBench/
    IoTBench-test-suite/tree/master/openHABLast accessed April 2020
    Cited by: §7.2, §7.5.1.
  • [40] J. J. Kistler and M. Satyanarayanan (1992-02) Disconnected operation in the Coda file system. ACM Transactions on Computer Systems 10 (1), pp. 3–25. External Links: ISSN 0734-2071, Link, Document Cited by: §1.
  • [41] G. Lee (2012) Resource allocation and scheduling in heterogeneous cloud environments. Ph.D. Thesis, University of California at Berkeley, USA. External Links: ISBN 9781267611550 Cited by: §8.
  • [42] C. M. Liang, B. F. Karlsson, N. D. Lane, F. Zhao, J. Zhang, Z. Pan, Z. Li, and Y. Yu (2015) SIFT: building an internet of safe things. In Proceedings of the 14th International Conference on Information Processing in Sensor Networks, IPSN ’15, New York, NY, USA, pp. 298–309. External Links: ISBN 978-1-4503-3475-4, Link, Document Cited by: §1, §8.
  • [43] S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong (2005-03) TinyDB: an acquisitional query processing system for sensor networks. ACM Trans. Database Syst. 30 (1), pp. 122–173. External Links: ISSN 0362-5915, Link, Document Cited by: §1.
  • [44] Mapping the Smart-Home Market. Note: https://www.bcg.com/publications/2018/mapping-smart-home-market.aspxLast accessed April 2020 Cited by: §1.
  • [45] Microsoft Flow. Note: https://flow.microsoft.comLast accessed April 2020 Cited by: §8.
  • [46] S. Munir and J. A. Stankovic (2014) DepSys: dependency aware integration of cyber-physical systems for Smart Homes. In ICCPS ’14: ACM/IEEE 5th International Conference on Cyber-Physical Systems (with CPS Week 2014), ICCPS ’14, Washington, DC, USA, pp. 127–138. External Links: ISBN 978-1-4799-4930-4, Link, Document Cited by: §1, §8.
  • [47] OpenHAB. Note: https://www.openhab.org/Last accessed April 2020 Cited by: §8.
  • [48] H. E. Ramadan, C. J. Rossbach, D. E. Porter, O. S. Hofmann, A. Bhandari, and E. Witchel (2007) MetaTM/txlinux: transactional memory for an operating system. ACM SIGARCH Computer Architecture News 35 (2), pp. 92–103. Cited by: §8.
  • [49] R. Ramakrishnan and J. Gehrke (2003) Database management systems (3. ed.). McGraw-Hill. External Links: ISBN 978-0-07-115110-8 Cited by: §1.
  • [50] Raspberry Pi 3 Model B+. Note: https://www.raspberrypi.org/products/raspberry-pi-3-model-b-plus/Last accessed April 2020 Cited by: §7.5.1.
  • [51] D. Retkowitz and S. Kulle (2009) Dependency management in smart homes. In IFIP International Conference on Distributed Applications and Interoperable Systems, pp. 143–156. Cited by: §1, §8.
  • [52] Roomba. Note: https://www.irobot.com/roombaLast accessed April 2020 Cited by: §8.
  • [53] Routines not working. Note: https://support.google.com/assistant/thread/3444653?
    hl=enLast accessed April 2020
    Cited by: §8.
  • [54] A. S., T. L., M. S. A., and C. A. S. (2019-07) Transactuations: where transactions meet the physical world. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), Renton, WA, pp. 91–106. External Links: ISBN 978-1-939133-03-8, Link Cited by: §1, §1, §1, Table 2, §4.1, §8.
  • [55] C. S., R. P. S., A. P., A. K., and R. M. (2016) Beam: ending monolithic applications for connected devices. In 2016 USENIX Annual Technical Conference (USENIX ATC 16), Denver, CO, pp. 143–157. External Links: ISBN 978-1-931971-30-0, Link Cited by: §8.
  • [56] Scheduled routines not reliable. Note: https://support.google.com/assistant/thread/366154?
    hl=enLast accessed April 2020
    Cited by: §8.
  • [57] F. Schneider (1990-12) implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22 (4), pp. 299–319. External Links: Link, Document Cited by: §4.2.
  • [58] SmartCan: take control of your chores. Note: https://www.rezzicompany.com/Last accessed April 2020 Cited by: §1.
  • [59] J. Koon Smart sensor applications in manufacturing (Enterprise IoT Insights). Note: https://enterpriseiotinsights.com/20180827/
    channels/fundamentals/iotsensors-smart-sensor-applications-manufacturingLast accessed April 2020
    Cited by: §1.
  • [60] Smart home market worth $151.4 billion by 2024. Note: https://www.marketsandmarkets.com/PressReleases/
    global-smart-homes-market.aspLast accessed April 2020
    Cited by: §1.
  • [61] Smart home. Note: https://www.statista.com/outlook/279/109/
    smart-home/united-statesLast accessed April 2020
    Cited by: §1.
  • [62] Future of smart hospitals (ASME). Note: https://aabme.asme.org/posts/future-of-smart-hospitalsLast accessed April 2020 Cited by: §1.
  • [63] SmartThings Smart Apps. Note: https://github.com/SmartThingsCommunity/
    SmartThingsPublic/tree/master/smartappsLast accessed April 2020
    Cited by: §7.2, §7.5.1.
  • [64] Stop shouting at your smart home so much and set up multi-step routines. Note: https://www.popsci.com/smart-home-routines-apple-google-amazonLast accessed April 2020 Cited by: §1.
  • [65] O. Storz, A. Friday, and N. Davies (2006) Supporting content scheduling on situated public displays. Computers & Graphics 30 (5), pp. 681–691. Cited by: §8.
  • [66] D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser (1995) Managing update conflicts in Bayou, a weakly connected replicated storage system. In Proceedings of the 15th ACM Symposium on Operating Systems Principles, SOSP ’95, New York, NY, USA, pp. 172–182. External Links: ISBN 0-89791-715-4, Link, Document Cited by: §1.
  • [67] The top 5 problems with smart home tech and how to troubleshoot them. Note: https://www.nachi.org/problems-smart-home-tech.htmLast accessed April 2020 Cited by: §1.
  • [68] TP Link device driver. Note: https://github.com/intrbiz/hs110Last accessed April 2020 Cited by: §6.
  • [69] TP Link HS105. Note: https://www.tp-link.com/us/download/HS105.htmlLast accessed April 2020 Cited by: §2.1.
  • [70] TP Link KASA HS100, HS220, KL130. Note: https://www.kasasmart.com/Last accessed April 2020 Cited by: §6.
  • [71] TP Link KASA HS105, HS110, HS200. Note: https://www.kasasmart.com/Last accessed April 2020 Cited by: Figure 1, §6.
  • [72] TP-link KASA android app. Note: https://www.tp-link.com/us/kasa-smart/kasa.htmlLast accessed April 2020 Cited by: 9(c).
  • [73] Velux: smart home, smart skylights. Note: https://whyskylights.com/Last accessed April 2020 Cited by: §1.
  • [74] V. Vogels (2009-01) Eventually consistent. Communications of the ACM 52 (1), pp. 40–44. External Links: ISSN 0001-0782, Link, Document Cited by: §1, §4.3.
  • [75] Wemo. Note: https://www.wemo.com/Last accessed April 2020 Cited by: §6.
  • [76] What is imprintTM link technology?. Note: https://homesupport.irobot.com/app/answers/detail/a_id/
    21088//what-is-imprint%E2%84%A2-link-technology%3FLast accessed April 2020
    Cited by: §1, §8.
  • [77] Workflow. Note: https://workflow.is/Last accessed April 2020 Cited by: §8.
  • [78] Is Xfinity having an outage right now?. Note: https://outage.report/us/xfinityLast accessed April 2020 Cited by: §6.
  • [79] Zapier. Note: https://zapier.com/Last accessed April 2020 Cited by: §8.
  • [80] Q. Zhou and F. Ye (2019) APEX: automatic precondition execution with isolation and atomicity in internet-of-things. In Proceedings of the International Conference on Internet of Things Design and Implementation, IoTDI ’19, New York, NY, USA, pp. 25–36. External Links: ISBN 978-1-4503-6283-2, Link, Document Cited by: §1, §8.