Advances in machine learning and hardware have made online voice recognition a reality. As of July 2018, 43 million Americans own a voice assistant(NPR and edison research, 2018), such as an Amazon Echo or Google Home. Voice as an input interface is changing the way users interact with systems and devices. Voice assistants offer the users a convenient medium to access information, control other smart devices, set alarms, play games, or set to-do lists. To serve users in real-time, a voice assistant has to be always on, continuously listening for a user to utter a wake word, such as “Alexa” or “OK, Google.”
This convenience comes at a potential privacy cost to these users. A device can mistakenly “hear” a trigger word which will cause it to stream audio recordings to its cloud. Last May, because of several mishaps in Amazon’s voice recognition system, an Amazon Echo device recorded a couple’s conversation and sent it to their manager (Horcher, 2018) without their knowledge. Malicious actors can compromise voice assistant devices and turn them into convenient spying devices inside residences (Palmer, 2018). Further, several tech companies have filed patents “envisioning” systems that continuously listen and process nearby conversation to serve users with personalized advertisements (Court, 2017). Some researchers have observed an Amazon Echo device to connect more frequently to its servers than other smart home devices, even without a wake word and when muted (Hill and Mattu, 2018).
Despite these possible privacy issues, privacy does not appear to be an inhibiting factor in users’ adoption and use of voice assistants (Pew Research Center, 2017). Little is known about how individuals perceive privacy risks associated with voice assistants and what are the users’ mental models for a privacy/utility trade-off in this ecosystem. With this in mind, we conduct a technology probe study (Hutchinson et al., 2003) where we deploy three privacy-preserving interventions in homes. Each intervention sits at a different point of the usability/utility/privacy spectrum. The first is the voice assistant’s built-in mute button. The second is Blackout, a smart plug that allows user to remotely engage/disengage a voice assistant. The third is Obfuscator, shown in Fig. 1. We designed and implemented Obfuscator which uses ultrasound-based jamming to prevent a voice assistant from listening to nearby conversations.
We conducted in-home interviews with 15 households involving individuals and families (including a pilot studies involving two households) using the interventions to provoke discussion and reflections about individuals’ practices with voice assistants and their privacy perceptions. These discussions offered an in-depth understanding about users’ view of their privacy/utility/usability trade-offs within the voice assistant ecosystem. Also, we gained insights about the important design dimensions for future privacy preserving technologies for voice assistants: seamless operation, ease of setup, and modern aesthetics.
2. Related Work
A smart speaker, or voice assistant, is a type of Internet-of-Things (IoT) device, specifically a wireless-connected speaker with an integrated virtual assistant. A user interacts with such a device in a hands-free manner; an interaction starts after the user says a wake word. Most commercial voice assistants today utilize wireless protocols, and can be integrated with home automation control. For the purposes of our study, we consider two popular voice assistants: (i) Google Home Mini, and (ii) Amazon Echo Dot. We highlight salient features of both in Table 1.
|Feature||Home Mini||Echo Dot|
|Height Diameter||4.3 cm 9.9 cm||3.3 cm 7.6 cm|
|Wake words||OK, Google||Hey, Alexa|
|Visual Cue||4 dots on the surface||LED band|
These voice assistants operate in an always-on mode; they can stealthily record any conversation in their vicinity. Depending on the nature of the conversation, various forms of (sensitive) information can be extracted. While one organization claims this is a one-off act (Horcher, 2018), the other blame erroneous code (Google Home Help, 2017). However, these organizations also file patents to trigger recording on the occurrence of specific trigger words. Worse, voice assistants were observed to constantly upload information, despite the absence of user interaction (Hill and Mattu, 2018).
The security and privacy researchers have investigated these privacy violations. Feng et al. (Feng et al., 2017) propose continuous authentication as a mechanism to thwart the aforementioned problems. Combined with the works of Roy et al. (Roy et al., 2018b, a) and Zhang et al. (Zhang et al., 2017), Gao et al. (Gao et al., 2018) propose a more usable solution that uses ultrasound jamming to address stealthy recording. In this work, we wish to validate the usability claims made by the above works; consequently, we base our intervention design on the above works. Other solutions involving intercepting/monitoring traffic at the network gateway are possible (Apthorpe et al., 2017; Sivaraman et al., 2015). However, their behaviour when the traffic is encrypted and non-deterministic is unclear.
The methodology of this study is most similar to Zheng et al. (Zheng et al., 2018), and Kaaz et al. (Kaaz et al., 2017). Similar surveys are carried out in (Choe et al., 2011; McCreary et al., 2016; Brush et al., 2011). The former attempt to understand the privacy perceptions of users living in homes with various IoT devices, while the latter try to identify the various challenges associated with setting up and using these devices. Our findings, observations, and conclusions are consistent with those made by Zeng et al. (Zeng et al., 2017). At a high-level, however, the goal of our study is most similar to the work of Lau et al. (Lau et al., 2018); we both try to understand if there is a mismatch between privacy expectations and the device behavior. In this study, we go one step further to study, if at all, this gap can be bridged.
Our objectives in this study are two-fold. First, we aim to understand how individuals perceive the privacy issues associated with voice assistants. Second, we want to assess their views about different mitigation strategies. In this study, we follow a technology probe approach to achieve both objectives. We introduce three “interventions” that prevent the voice assistants from listening to the users.
Using these interventions, and through semi-structured interviews, we explore the users’ experience, stance, practices, and understanding regarding voice assistants. We elicit user discussions and responses that help us understand what utility voice assistants provide them as well as perceived privacy risks. We investigate whether users identify a privacy/utility trade-off with voice assistants. Moreover, we incite discussions about the different dimensions involved in deploying privacy preserving technologies, if needed. Throughout this process, we uncover insight into various features that make these technologies useful in practice.
3.1. Privacy-Preserving Interventions
The three interventions, which we use in this study, are displayed in Fig. 4. To develop these interventions, we extensively reviewed various straw man solutions to prevent stealthy recording. We then discussed the different pros and cons from each approach, and reflected upon features which we felt are relevant to the final design of each intervention. These features include user interaction with the device, aesthetics, mode/ease of deployment and privacy properties.
The first intervention is the mute button (Fig. 1(a)) available in the commercial voice assistants. It represents a built-in privacy control. It, however, requires the user to physically interact with the device to engage/disengage the control and to place trust in the manufacturer’s implementation of the muting feature.
Our other two interventions are bolt-on which require the user to deploy an additional device to mute the voice assistant without modifying it. Bolt-on solutions have the advantage of not having the user trust the built-in controls of the voice assistants. We focus on hardware-based interventions that exhibit an intuitive operation that inexperienced users can comprehend. Software-based interventions, such as intercepting network traffic associated with voice data, can be cumbersome and do not provide privacy properties that users can understand. We follow a simple methodology to determine the nature of our hardware-based interventions by considering physical interfaces associated with recording: device power and the microphone. Our second intervention, Blackout (Fig. 1(b)), targets the power interface of the voice assistant and our third intervention, Obfuscator (Fig. 1(c)), targets the microphone interface of the voice assistant.
3.1.1. The Built-in “Mute” Feature
This is available as a push button on the top panel of the Amazon Echo Dot and as a sliding button on the side of the Google Home Mini. Activating the mute button stops the voice assistant from responding to the user’s voice commands. Upon activation, the ring color changes to red in the Echo Dot and the four lights on top of the Google Home Mini turn red.
This is essentially a switch that controls the power to the voice assistant. As voice assistants are typically wall-powered, disabling the power flow will turn the device off thereby preventing it from listening. We use a commercial remote-controlled plug111https://www.amazon.com/Beastron-Remote-Control-Electrical-Outlet/dp/B074CRGFPZ as an instance of this intervention. The user deploys Blackout by connecting the voice assistant to the outlet through the smart plug (as seen in Fig. 1(b)). The user can engage/disengage Blackout through a remote control without the need to physically interact with the device. The smart plug glows red when power is flowing.
Clearly, Blackout offers an immediate privacy guarantee against the stealthy recording of user’s conversations. It comes, however, with a high usability cost; the users have to wait for a lengthy boot time whenever they wish to reuse the voice assistant.
This intervention targets the microphone of the voice assistant. At a high-level, Obfuscator jams the microphone of the voice assistant when the user needs privacy protection (Fig. 2(a)). It is offers a middle-ground between the privacy and usability constraints of the previous interventions. Users can engage/disengage the jamming remotely, obviating the need to physically interact with the privacy preserving intervention. When disengaging the jamming, the user can immediately interact with the voice assistant. Obfuscator creates high-power noise at the voice assistant’s microphones but does not affect its operation otherwise.
In the following, we describe the inner-workings of Obfuscator and the design process that led to final prototype of Obfuscator (Fig. 1(c)).
A jamming signal of frequency within the audible range is annoying to the users and renders the intervention unusable. Fortunately, recent research has discovered that commodity microphones exhibit non-linear effects that allow them to capture ultrasound signals (Roy et al., 2018b, a; Zhang et al., 2017; Gao et al., 2018) in the audible range. Hence, a commodity microphone “hears” an ultrasound signal, outside the audible range. At the microphone’s side, this jamming signal will interfere with the user’s audible speech (conversations and commands), preventing the voice assistant from listening to the user’s conversations.
Fig. 2(b) shows the captured signals from a commodity microphone before and after Obfuscator is engaged. Before jamming is invoked, the microphone records a background conversation which is clearly audible at playback. After engaging Obfuscator, the ultrasound jamming signal is recorded at the microphone and completely overwhelms the recorded conversations.
Obfuscator’s circuitry includes a DC power supply that is a remote-controlled (allows the users to remotely engage and disengage Obfuscator), ultrasound generator (produces signals at 27 kHz), and a horn speaker that emits the ultrasound signal.
We explored different design options for the prototype that houses the circuitry. Designing a prototype of Obfuscator is challenging because of the relatively large footprint of the circuitry as compared to the other prototypes. We already know from previous research (Machuletz et al., 2018) that the user’s choice to deploy privacy-enhancing technologies in the physical world is influenced by perceived control over privacy more than initial concerns about privacy. Other factors relate to the convenience and social acceptability of the privacy-preserving intervention.
Our design process started with an exploration of a privacy metaphor, one that creates the perception of privacy control for the users. Our initial prototype was based on a “cage” metaphor. Here, the Obfuscator intervention is housed in a cage-like structure with a door. When the user closes the cage door, Obfuscator generates the ultrasound obfuscation signal to prevent the voice assistant from listening. The user has to manually open the door to disable obfuscation and communicate with the voice assistant. Closing the door ”locks” the device in a cage, providing user with a perception that the device is not active and their space is private. Opening the door unlocks and opens up the voice assistant to the user’s space.
The first version of the Obfuscator intervention followed the cage metaphor as a 3D printed cylinder (Fig. 3(a)). The cylinder has two compartments; the lower compartment contains the circuit and the ultrasound speaker. The upper compartment has space for the voice assistant as well as the door. The first version has a height 15.5 cm inches and a diameter of 12 cm. After several reflective discussions, we determined the prototype to be bulky. We refined the cage-based design into a smaller, lighter and less conspicuous 3D printed cylinder (Fig. 3(b)); the second version has a height of 13 cm and as diameter of 11 cm.
Based on pilot studies, however, we found both versions to be neither user-friendly nor aesthetic. Individuals indicated that this design was not something they would want in their homes. Further, we observed that individuals did not associate with the privacy metaphor. First, they did not favor the idea of physically interacting with the prototype as it takes away from the convenience of using a hands-free device. Second, covering the voice assistant inside the cylinder deprives the users of the ability to observe the visual cue. Finally, they thought that the actions of opening and closing the prototype door label them as privacy conscious in the eyes of others, which they do not prefer.
We factored these opinions into designing a third version of the prototype. We considered three aspects that the users were not fond of: physical interactions with the door, covering the voice assistant, and the aesthetics. The third version of the prototype (Fig. 3(c)) features a platform-like solution which address those shortcomings. This version has a glass cylinder which houses the circuit and is covered by decorative sand; its height is 11 cm and diameter is 12.5 cm. The platform, where the voice assistant sits, is encased with synthetic leather. The user can engage/disengage the jamming signal via a remote control, obviating the need for physical interaction. This version of the Obfuscator intervention follows a different privacy metaphor: “virtual veil.” By engaging the jamming signal, Obfuscator creates a virtual privacy dome around the voice assistant, preventing it from listening to the conversations. Our subsequent discussions and reflections about this version revealed that the open nature of the prototype might not enforce the privacy metaphor; users are less likely to perceive privacy control over the voice assistant.
The design exploration process led to our final prototype of the Obfuscator intervention, as shown in Fig. 1. We substantially reduced the form factor of the final version and greatly improved its aesthetics. The new prototype houses the same circuitry in a glass candle holder. The glass is filled with decorative sands and sealed with burgundy burlap. The user only needs to place the prototype next to the voice assistant. Following the “virtual veil” metaphor, the final version of Obfuscator, in a sense, overpowers the voice assistant to create a virtual barrier between the voice assistant and the user’s space.
Similar to previous studies based on technology probes (Odom et al., 2012), we do not seek to evaluate the actual design of the Obfuscator intervention. Doing so requires running a long-term study with the intervention deployed in users’ homes. Such a study allows for better analyzing the social contexts of introducing physical objects in the user’s space. Instead, in this study, we aim to understand how the users of voice assistants react to different privacy-enhancing technologies using a proof-of-concept prototypes. A final design will likely have to feature different form factors, shapes, material, and colors to cater to the tastes of different individuals.
We recruited 15 participants (some with families and others as single individuals) within a 15-mile radius of a university campus. We conducted two pilot interviews to refine our study protocol and the interventions. The results reported in the rest of the paper refer to interviewing 13 participants (). This approach exhibits several limitations, of which the most important is the sampling of participants from a campus community; the reported results are less likely to generalize to another population of users. This limited set of participants, however, provides an opportunity for an in-depth investigation of the privacy perceptions regarding voice assistants and the design space of the future privacy preserving technologies.
Table 2 summarizes the demographics of the main participants (excluding participating family members and pilot studies) in our study. There were 7 male participants, and 6 female participants in our study. The youngest of the main participants is 27 years old and the oldest is 60 years old. All the participants hold at least a Bachelors degree and are employed. The occupations of the participants ranged from students to skilled labor to professors. , , , , , and participated with family members in the study; , , , , , , and had pets (cats, dogs and a parrot) in the same room where the study was conducted. This diverse set of participants offers a breadth of experiences that are useful to analyze user interactions with the voice assistants and the interventions.
We conducted all interviews in the participants’ homes at a time of their convenience. Each interview lasted for one hour on average, and the participants were compensated for their time ($40 USD per participant). The study protocol was approved by our Institutional Review Board (IRB).
3.3. Interview Flow
We conducted the interview in three stages: environment exploration, interaction with interventions, and concluding discussions.
3.3.1. Environment Exploration
The interview began with the participants providing a brief tour of their home, with specific focus to the rooms where the voice assistants are placed. Then, the experimenter and the participants convened in the room with the most used voice assistant. After obtaining informed consent, the experimenter provided each participant with an online questionnaire to fill. The objective of this questionnaire is to establish baselines about the participant’s privacy stance, before any priming occurs.
The experimenter followed with deeper probing about the voice assistant’s role in the participant’s daily life. The questions focused on frequency, time and the purpose of usage. Also, the questions covered the conversations and activities participants have around their voice assistants. Then, the experimenter asked about the participant’s degree of trust in these devices (in terms of recording their conversations), their manufacturers, and hypothetical third parties (with whom the recordings might be shared). The experimenter probed whether individuals have read news or heard anecdotes about voice assistants misbehaving.
3.3.2. Interaction with Interventions
Then, the experimenter randomly chose one intervention at a time, and briefly narrated its capabilities and expected behavior. The participants were given time to familiarize themselves with the intervention and set it up. The random ordering of intervention across participants helps reducing the bias towards a single intervention. In settings where we had family with the participant, the experimenter asked the different family members to interact with the intervention. After setting up the intervention, the experimenter asked the participant to issue voice commands after engaging/disengaging the intervention. At each step, the experiment probed the participant about their comfort with the intervention and its effect on using the voice assistants. The participants were encouraged to envision future use-cases for each intervention and test the elasticity of the intervention’s functionality.
After interacting with each intervention, the experimenter inquired about the participant’s trust level in the intervention. Based on the nature of the response, the experimenter probed the user to arrive at the root of their trust level. The probing questions were designed to elicit critical reflections on the intervention, perceived privacy control, trust level, convenience, and aesthetics.
3.3.3. Concluding Discussions
After interacting with the three interventions, the experimenter engaged the participants in an open-ended discussion about the interventions and their impact on the voice assistants’ privacy. Finally, the experimenter administered another survey before compensating the participants.
3.4. Data Sources
We rely on two data sources in our study: questionnaires and transcribed interviews. At each interview, we administered pre-interview and post-interview questionnaires. The pre-interview questionnaire asked about the participant’s: privacy baseline; number of deployed voice assistants; frequency and purpose of using the voice assistants; and understanding of the voice assistant’s operation. To gauge the participant’s privacy baseline, the questionnaire contains a set of four questions on a five-point Likert scale (Strongly Disagree = 1 – Strongly Agree = 5). We utilize the “Concern for Privacy” (Milne and Culnan, 2004) scale which is modeled after the well-known “Information Privacy Concern scale” of Smith et al. (Smith et al., 1996). The four questions are: “I’m concerned that online services are collecting too much personal information about me,” “It bothers me to give personal information to so many online services,” “When online services ask me for personal information, I sometimes think twice before providing it,” and “It usually bothers me when online services ask me for personal information.” The post-questionnaire asks the same privacy-related questions after rephrasing them to handle voice assistants instead of online services. The second source of data are the transcribed interviews, corresponding to nearly 15 hours of recordings.
3.5. Grounded Theory Analysis
We transcribed, coded and analyzed the interviews using a Grounded Theory approach (Charmaz and Belgrave, 2012; Glaser and Strauss, 2017). We started the analysis with open-coding stage to identify more than 200 informal codes that define key ideas in the interview transcripts. Using these informal codes, we extracted recurrent themes within the transcripts and converged on a set of 56 formal codes. We further refined the formal codes into 16 axial codes. Finally, we organized the codes into six major themes as evident in Table 3. These six themes can further be grouped into two broader categories: participants’ attitudes towards privacy (the first four themes) and participants’ attitudes towards the voice assistant/intervention (the last two themes).
In this section, we elaborate upon the six major themes that emerged from our coding process (cf. Table 3). In particular, we discuss the observations and findings from our visits to the homes of the 15 participants, and correlate them with the broader themes we discovered.
Any statement/quote made by the participant is in quotation marks followed by the participant identifier in brackets. An example would be “It’s always sunny in Philadelphia” (). Additional text to provide context, not spoken by the participant, is in square parentheses, i.e., [ ]. An example would be “It’s always sunny [in Philadelphia]” (). Text not relevant to the discussion will be represented with an ellipsis. An example would be “ … sunny in Philadelphia” ().
|Characterizes the number of voice assistants in the participant’s home; their location(s); how, for what, and how frequently they use their device(s); who set the device(s) up; nature of discussions near the device(s)|
|Sensitivity towards listening and recording|
|Characterizes the privacy perceptions pertaining to conversations being recorded|
|Security/privacy perception towards listeners|
|Characterizes the participants’ opinion of the device with regards to its privacy features and manufacturer (specifically, whether the manufacturer is a (un)known violator of privacy). It also characterizes the participants’ opinions of third parties to whom any potentially recorded data can be shared.|
|Intervention’s effects on privacy/utility trade-off|
|Characterizes the perceived privacy/utility trade-off of the voice assistants; also characterizes the participants’ expectations on how to use the interventions|
|Built-in versus bolt-on privacy controls|
|Characterizes the usability burden to accommodate to deploy intervention; acceptance of privacy-preserving technology (either built-in or bolt-on); trust in the privacy-enhancing technology produced by researchers|
|Characterizes predominantly the appearance and form factor of the interventions|
4.1. Privacy Baseline
Our pre-interview and post-interview questionnaire reveal an interesting trend on how the participants view the privacy threats from online service providers and voice assistants. All participants have indicated they have privacy concerns when interacting with service providers. Except for two, all participants either agreed or strongly agreed with the four statements of the privacy baseline. When asked the same questions about their privacy concerns regarding voice assistants, we notice a sharp decline in the concern level; most of the participants disagreed or strongly disagreed with the four statements. We conclude that the participants exhibit lower privacy concerns with voice assistants as compared to online service providers. This conclusion is confirmed in our discussions with the participants and is consistent with earlier studies on the topic (Lau et al., 2018).
We delve deeper into the privacy concerns of different groups regarding the voice assistant privacy. In particular, we identify whether age ( vs. ), gender (male vs. female), education level (college vs. post-graduate), or occupation (STEM vs. other) have a significant relation with the privacy concerns. Comparing the total privacy concern (Milne and Culnan, 2004)
between the different groups using Welch’s t-test, we did not observe a significant relation of age (), occupation (), or education () with participants’ privacy concerns for voice assistants. The only exception is gender (). We observe that female participants have significantly higher privacy concerns regarding voice assistants than their male counterparts. The average privacy concern value for males is 7.8 () and that of females is 13.5 (); males answered the privacy concern statements with disagree/strongly disagree, while females answered with agree/strongly agree.
4.2. Deployment Environment
Understanding the deployment environment is essential to understand how to design a privacy preserving intervention, specifically its aesthetic and method of use. For example, an intervention to be placed in the bedroom needs to be designed differently than that placed in the living room.
The experimenter posed an initial set of questions aimed at gauging the number of devices222Used interchangeably with voice assistant each participant uses, and the location of these devices. We understand that most users have fewer than 6 devices, with being the anomaly with 15 devices. A large number of these devices do not provide any specific utility, but is more of a novelty feature for the participants, as we allude to later; some participants claim that these voice assistants are points of conversation in social gatherings. With such a large number of devices in a home, at least one device is always within ear shot of the participant. This has unforeseen implications; during our interviews, we observed that a command directed at a particular voice assistant (say, in the living room), was responded to by the voice assistant in the adjacent room (say the kitchen). When asked about the large number of devices in his home, stated that it was “just so where ever I am at in the house, I can, you know, they can hear me.” There is no specific location where the devices are preferably located; these range from the bedroom, to the living room, and the kitchen.
However, the common trend we observe with all participants in this study is that they use the voice assistant for simple queries such as inquiring about the weather, or the time, or recommendations for restaurants etc. The responses to all these queries can be obtained using their smart-phones, but participants prefer vocal interactions to physically interacting with their devices. This is a recurrent theme in our analysis, one which we will highlight in subsequent sections. While one participant has integrated the voice assistant with his home control (for lighting), he claims that while not always using vocal commands to illuminate his home, “… there’s like you know maybe that 30% of time well it’s really nice like you don’t want to get up or you’re outside the house or something like that so yeah at times it’s really convenient” (). This also suggests that, for the most part, these devices are sparingly interacted with on a daily basis; in the times they are not interacted with, these devices can potentially stealthily record the users.
The vast majority of the participants set up their devices themselves. However, this is not always the case - “didn’t set this up and recently my daughter and her husband left” (). This is an important observation; participants who set up the device had some understanding of its functioning i.e. they believe that the vocal commands are processed locally or off-site, and an appropriate response is returned. The participant who did not set up the device was unaware; “if I were to set it up I think I would have looked more into like what is it really doing” (). However, all the participants believe that the voice assistant was always listening, and only listening for specific words. “I don’t believe that they are listening, they’re only listening for the wake word333The wake word is the word that triggers the voice assistants functionality.” ().
The final observation we make is that people are comfortable having specific forms of sensitive conversations pertaining to marital, and financial issues around the voice assistant, with the exception of - “I don’t think we’re ever like discussing account numbers account numbers…” However, most of these participants are unaware of the repercussions of extensive data mining. Thus, these participants have a very extreme definition of what a sensitive conversation constitutes. As we allude to in the next section, these perceptions change for some of the participants when they are made aware of the implications of continuous recording, and the entities that are capable of performing these recordings.
4.3. Sensitivity towards listening and recording
The participant’s ability to distinguish between a conversation being overheard and recorded serves as a very coarse approximation of their overall privacy perceptions. To elaborate, we conjecture that someone who believes their conversation is recorded is someone who believes something malicious can be done with the recording, and consequently will attempt to safeguard against such a recording process.
All the participants claim to be comfortable when their conversations are overheard or listened to by the voice assistant. However, they display lower levels of comfort when these conversations are being recorded, often disbelieving the possibility of that event occurring. As explained earlier, most of the participants are unaware of the various latent features that can be mined from the recordings. However, participants believe that any malicious conversation they may inadvertently have may be recorded, resulting in their discomfort. provides an example - “…like let’s just say that someone beats their kid or something like that you know … there’s certain things that you expect to be private…”444The authors stress that they observed no forms of violence of abuse in any of their interviews.
The reason for the participants’ comfort towards conversation overhearing was explored in detail. Here, several themes consistently emerged.
Participant believes it is for the greater good. Here, the participant believes that overhearing and recording conversations makes the voice assistant smarter. states, “why not [record a conversation], Amazon already sort of does that for training its AI”. states, “..what I’m saying, …, going to be feedback …”
Participant feels their conversations are not important. Here, the participant feels that there daily conversations are routine and inane, and reveal nothing of significant interest to any hypothetical party interested in the participant.
Participant feels they (as a single individual/family) are not important. Here, the participant feels that are one of many users utilizing the voice assistant, and feel that this provides them some degree of comfort. To quote -
“I’m one of probably millions who are doing this and talking constantly … the anonymity of it I suppose.”
Participant feels they are monitored/tracked on other platforms. Here, the participant is acclimated to various forms of tracking. To quote , “… someone had phone, and they were having a conversation about a particular subject, and then the following days then they started getting you know the spam on the side related to that”. says, “ … I also realized that there’s so many other things that we do online and everywhere that can give information …” says, “… when I’m typing if I’m typing things in … I sort of think about somebody tracking that a lot more”
Participant feels that their information is not valuable. To succinctly state the same, says “I don’t know why anybody would [overhear/record me]”
4.4. Security/privacy perception towards listeners
We observed that the participants eventually perceived three different types of threats. The first type is the software bugs in the voice assistant causing it to behave erroneously/maliciously. While possible, participants believe these are improbable. When suggested that the device can be programmed to record the participant, , with ties to one of the device manufacturers strongly disagreed: “I’m sure there was some technical glitch…” This was the prevalent opinion shared by most of the participants i.e. the device itself was not compromised, and it only reacts to specific wake words (as described in §4.2). Participants were unaware of recent compromises to device hardware and software (Greenwald, 2014; Google Home Help, 2017), which further explains their beliefs.
The second and third threats can broadly be classified as listeners. We define listeners to be the parties that are capable of listening to the participants’ voice commands. These could be the device manufacturers such as Google and Amazon – threat 2, or any party these manufacturers can potentially share this data with – threat 3.
To provide insight into threat 2, the experimenter asked if the participants believed if there were a human-in-the-loop listening to/recording their conversations; believes that it’s “just an API”. This sentiment is shared among other participants; states that “… it’s simply an algorithm that listens for keywords and then sends that sound file off for analysis …. there is no human being listening to it…” A majority believe that the device manufacturers will not perform stealthy recording. They believe that these organizations will be subject to heavy legal penalties should they record information stealthily. This opinion persists despite the participants being made aware of recent news where a couple was recorded stealthily, and this recording was shared with one of their contacts (Horcher, 2018). Another opinion that participants have is that one organization is more likely to perform recording than the other. “I think that their [Google] market is different than Amazon. They they are out to sell data” (). However, the minority believe that these organizations warehouse information collected, and can subsequently process them at a later, more convenient time - maybe when the laws are amended. “… I have no idea what sort of power they were sharing the next five or ten years so to me I would just I think I would be equally cautious of whoever is recording…” ().
When it comes to third parties, most participants are particularly weary of hypothetical situations of governmental agencies holding on to recorded information. As noted by , “in today’s political climate it seems like they’re more likely to want and utilize recorded information…” The participant goes on to say that she believes that device manufacturers, should they record information, will eventually feel compelled to share this information with the government. Another participant, , feels the exact opposite; “I don’t think that Amazon is going to want to pass on information that they gather.” A small subset of the participants feels that recorded information, either with third parties or manufacturers is no cause for concern. “.. it’s a little bit comforting because I know like if I guess if anything should happen that maybe there would be a record” ().
There is one overarching theme; while the majority of the participant pool tolerates stealthy overhearing of conversations, a vast majority will consider stealthy recording a problem and will actively safeguard themselves against it.
4.5. Intervention’s effects on privacy/utility trade-off
To understand the privacy/utility trade-off, the experimenter first tried to ascertain the frequency with which the participants use their voice assistants. We observe that the participants sparingly use the voice assistants, an average less than a dozen times a day, all for a few seconds. With this as the premise, we observe that some of the participants believe that using the remote-controlled Obfuscator, though moderately convenient, is not ideal. As comments, “if I wasn’t like completely not using that [the voice assistant ] and unplugging it, sure I would like that it was there. Well I’m just kind of around and using it [both the voice assistant and Obfuscator ] sporadically.” Other users feel that the inbuilt feature is more tolerable as it does not require additional outlets, and exposes no additional cords. We observed that the ultrasound generated by Obfuscator did not bother the pets in any of the participants’ homes. This suggests promise in such a design.
The mode of operation is another point of contention. While some participants feel that should they use an intervention, it must always be enabled and they can selectively disable it. Another subset feels that the inverse should be the case. This is, philosophically, similar to the white-listing vs. black-listing security approach; both with its own pros and cons. Further usability analysis is required to understand the privacy/utility trade-off for this particular point.
4.6. Built-in versus bolt-on privacy controls
A vast majority of the participants trust our bolt-on interventions more than the built-in intervention provided by the device manufacturer. The reason for this, as eloquently put by , is “it’s kind of, like, having the wolf watch the sheep.” As expected, both the inbuilt mute button and the Obfuscator intervention were voted to be the most usable. This stems from the fact that Blackout causes a delay every time the device is rebooted, and participants are unwilling to tolerate the wait time. However as we highlight next, the mode of interaction and location of the intervention play a notable role in guiding future design iterations. An ideal solution, as defined by multiple participants, is one that can be easily attached to the voice assistant. One can envision such an intervention to be similar to V.1 of our Obfuscator intervention, but more sleeker - similar to a sleeve for the voice assistant. To elaborate, “… what if it [Obfuscator and the voice assistant ] were fit together, on top of it or below it” (). When quizzed about such a design, remarked stating “that might be interesting”. While this, in principle is bolt-on, it does contain flavors of being built in. Other participants believe that smaller interventions that can easily be integrated with the voice assistant are more suitable. As noted by , “even if it were like some like a coaster …. that you place on top or something, so that it looks like it fits.”
The next facet of these privacy controls which we wished to explore was their usability. We confirmed that initial versions of our Obfuscator intervention would not be very usable. comments, “if I have to come over and physically engage with it, then it’s less likely I would want that.” Surprisingly, we observed that the remote-based Obfuscator (and Blackout) was also not very convenient for most participants. The remote could be one of several in a home, and could easily be misplaced. Participants also felt that a single remote will not scale to control multiple devices. They feel that a central hub of control would be ideal as the number of devices scale. As notes, “I wouldn’t mind that [hub], yeah like I think that would probably be the best solution for somebody like me who has multiple units [voice assistants ].” When quizzed about the hub, several participants preferred it as an app on their smartphone, citing that they did not want yet another device to account for. Participants also felt that using a remote disrupted the normal operational flow of issuing a vocal command i.e. the participants felt the additional step of disabling Obfuscator and Blackout was cumbersome. Most participants preferred a vocal command to enable privacy control.
Based on extensive discussion with the participants, we conclude that visual cues in the form of lights on the voice assistant or the interventions provide very little utility in indicating the state of the device, i.e., on and listening vs. not listening. This stems from the various locations where the voice assistant and the interventions are placed. For various reasons, the participants prefer to keep outlets concealed, and since Blackout is a remote-controlled outlet, the light on it is often not visible. Our intervention of Obfuscator does not have any visual cue, and relies on the visual cue from the remote-controlled outlet. Therefore, problems related to the Blackout intervention naturally extend here. The participants believe that auditory cues, such as the voice assistant not responding to the query are more informative of its state. This is a surprising observation; auditory responses can be delayed or absent for a variety of reasons, including improper user queries, or poor network, or other background noises.
With regards to the aesthetics of both Blackout and Obfuscator, three observations can be made.
Aesthetics are irrelevant. Here, the participants feel that the primary purpose of the intervention is to preserve privacy, and as long as this is achieved, aesthetics are secondary. An example of a similar useful, but unappealing device would be a WiFi router.
The form factor needs to be reduced, and the intervention needs to be better integrated with home decor. The participants felt that the Obfuscator intervention particularly stands out, and could be designed in such a way as to integrate better with existing furniture. An example of this could be integrating the required circuitry with a swing arm table lamp, or with a decorative showpiece.
The existing aesthetic features are acceptable, and does not require subsequent refining. The form factor of the interventions need not be improved, and can be deployed in homes as is.
We observed that a majority of the female participants fell into the second category above, while the male participants fall in the remaining two. However, a popular opinion shared among majority of the participants was that of a bolt-on solution that is easier to integrate with the voice assistant. We will iterate over several ideas for such a design in §5.
5.1. Privacy/Utility Trade-off Revisited
One of the main conclusions from this study is that the privacy/utility trade-off for voice assistants is not mature yet. Almost all of the participants were not concerned that voice assistants pose privacy threats because of their underlying assumptions about the mode of operation. Eight participants believe that the voice assistant only listens when the user utters a wake word. Some participants believe that captured audio is discarded completely after recognizing the commands. Others shrugged off recent cases about privacy threats as benign malfunction of the device. None of the participants raised the issue of a malicious compromise of the voice assistant as a possible privacy threat. Another reason has to do with the lack of anecdotal evidence about clear privacy threats to voice assistants. Recent cases about voice assistants have been caused by buggy behavior of the device; there haven’t been any popular cases about compromised voice assistants, as is the case with webcams.
Many participants indicated that if privacy were a real issue, they would “ditch” the voice assistant completely. This suggests that participants view voice assistants as a novelty, not as a necessity. The same does not apply for web browsing, for example, where users are willing to use privacy-enhancing technologies such as private browsing modes, tracking blockers or VPNs. Participants indicated that they have not used the built-in privacy control in voice assistants (some are not even aware of its existence). Those who have used it have done so to prevent the voice assistant from mistakenly responding, i.e., for convenience rather than privacy.
According to other studies (Zeng et al., 2017), this issue appears to be a common theme across the users of smart home systems. They have higher concerns for the physical security aspects more than their privacy. More research is needed in user education methods to make them more aware of possible privacy threats from physical devices.
5.2. Future Design Considerations
In our study, we observed convenience to be a unifying theme for all the participants. Their main purpose behind utilizing a voice assistant is convenience which they are not willing to compromise. Hence, future privacy preserving solutions have to include convenience as a primary design objective. We identify three design dimensions to ensure convenience. First, the privacy solution has to be aesthetically pleasing by blending-in with the voice assistant. A common theme in our study is that interviewed individuals favored a “sleeve”-like design that is modern-looking. The voice assistant sits inside the sleeve that should have small form factor. Another dimension is the ease of deployment; several participants commented on Obfuscator requiring an extra power plug. The ideal privacy preserving solution should have one plug that powers both the privacy system as well as the voice assistant. The installation effort needs to be a one-time effort as well. The third design dimension is to have a seamless experience while using the voice assistant. Users neither want to press a button nor use a remote to enable/disable their devices. An ideal design should have a voice interface that allows the user to control it as they control their voice assistants. This final issue effectively makes the privacy preserving solution listen all the time. The design has to be achieved in a manner that does not erode the user’s trust, for example by lacking a network interface to provide the users with privacy assurances.
We designed two privacy preserving bolt-on interventions, and combined with the inbuilt mute button, aimed to understand owners’ privacy perceptions of voice assistants. Through our interactions, we understand that female participants are more weary of threats posed by voice assistants. Male participants, with lower privacy concern, are more open to have a bulkier intervention at home, should they have one at all. A large subset of the participants placed no trust in the device manufacturer to preserve their privacy, while all of them believed in a privacy preserving intervention from a reputed third party, including researchers from academic institutions. The participants unanimously feel that a voice-activated intervention which can be easily attached to the voice assistant would be ideal. Based on these observations, we discuss how the privacy/utility trade-off impacts the design of such an ideal intervention. Ultimately, we hope this study will inspire future research into how awareness for various privacy threats can be made more pervasive, and how privacy preserving interventions can break into the average consumer’s home.
- Apthorpe et al. (2017) Noah Apthorpe, Dillon Reisman, Srikanth Sundaresan, Arvind Narayanan, and Nick Feamster. 2017. Spying on the smart home: Privacy attacks and defenses on encrypted iot traffic. arXiv preprint arXiv:1708.05044 (2017).
- Brush et al. (2011) AJ Brush, Bongshin Lee, Ratul Mahajan, Sharad Agarwal, Stefan Saroiu, and Colin Dixon. 2011. Home automation in the wild: challenges and opportunities. In proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2115–2124.
- Charmaz and Belgrave (2012) Kathy Charmaz and Liska Belgrave. 2012. Qualitative interviewing and grounded theory analysis. The SAGE handbook of interview research: The complexity of the craft 2 (2012), 347–365.
Choe et al. (2011)
Eun Kyoung Choe, Sunny
Consolvo, Jaeyeon Jung, Beverly
Harrison, and Julie A Kientz.
Living in a glass house: a survey of private moments in the home. InProceedings of the 13th international conference on Ubiquitous computing. ACM, 41–44.
- Court (2017) Jamie Court. 2017. Google, Amazon Patent Filings Reveal Digital Home Assistant Privacy Problems. (December 2017). http://www.consumerwatchdog.org/sites/default/files/2017-12/Digital%20Assistants%20and%20Privacy.pdf
- Feng et al. (2017) Huan Feng, Kassem Fawaz, and Kang G Shin. 2017. Continuous authentication for voice assistants. In Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking. ACM, 343–355.
- Gao et al. (2018) Chuhan Gao, Varun Chandrasekaran, Kassem Fawaz, and Suman Banerjee. 2018. Traversing the Quagmire that is Privacy in your Smart Home. (2018).
- Glaser and Strauss (2017) Barney G Glaser and Anselm L Strauss. 2017. Discovery of grounded theory: Strategies for qualitative research. Routledge.
- Google Home Help (2017) Google Home Help. 2017. [Fixed issue] Google Home Mini touch controls behaving incorrectly. (October 2017). https://support.google.com/googlehome/answer/7550221?hl=en
- Greenwald (2014) Glenn Greenwald. 2014. Glenn Greenwald: how the NSA tampers with US-made internet routers. (May 2014). https://www.theguardian.com/books/2014/may/12/glenn-greenwald-nsa-tampers-us-internet-routers-snowden
- Hill and Mattu (2018) Kashmir Hill and Surya Mattu. 2018. The House That Spied on Me. (February 2018). https://gizmodo.com/the-house-that-spied-on-me-1822429852
- Horcher (2018) Gary Horcher. 2018. Woman says her Amazon device recorded private conversation, sent it out to random contact. (May 2018). https://kiro.tv/2J5QLwP
- Hutchinson et al. (2003) Hilary Hutchinson, Wendy Mackay, Bo Westerlund, Benjamin B. Bederson, Allison Druin, Catherine Plaisant, Michel Beaudouin-Lafon, Stéphane Conversy, Helen Evans, Heiko Hansen, Nicolas Roussel, and Björn Eiderbäck. 2003. Technology Probes: Inspiring Design for and with Families. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’03). ACM, New York, NY, USA, 17–24. https://doi.org/10.1145/642611.642616
- Kaaz et al. (2017) Kim J Kaaz, Alex Hoffer, Mahsa Saeidi, Anita Sarma, and Rakesh B Bobba. 2017. Understanding user perceptions of privacy, and configuration challenges in home automation. In Visual Languages and Human-Centric Computing (VL/HCC), 2017 IEEE Symposium on. IEEE, 297–301.
- Lau et al. (2018) Josephine Lau, Benjamin Zimmerman, and Florian Schaub. 2018. “Alexa, Stop Recording”: Mismatches between Smart Speaker Privacy Controls and User Needs. Poster at the 14th Symposium on Usable Privacy and Security (SOUPS 2018) (2018).
- Machuletz et al. (2018) Dominique Machuletz, Stefan Laube, and Rainer Böhme. 2018. Webcam Covering As Planned Behavior. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). ACM, New York, NY, USA, Article 180, 13 pages. https://doi.org/10.1145/3173574.3173754
- McCreary et al. (2016) Faith McCreary, Alexandra Zafiroglu, and Heather Patterson. 2016. The contextual complexity of privacy in smart homes and smart buildings. In International Conference on HCI in Business, Government and Organizations. Springer, 67–78.
- Milne and Culnan (2004) George R. Milne and Mary J. Culnan. 2004. Strategies for reducing online privacy risks: Why consumers read (or don’t read) online privacy notices. Journal of Interactive Marketing 18, 3 (2004), 15 – 29. https://doi.org/10.1002/dir.20009
- NPR and edison research (2018) NPR and edison research. 2018. The smart audio report. (July 2018). https://www.nationalpublicmedia.com/smart-audio-report/latest-report/
- Odom et al. (2012) William Odom, Richard Banks, David Kirk, Richard Harper, Siân Lindley, and Abigail Sellen. 2012. Technology Heirlooms?: Considerations for Passing Down and Inheriting Digital Materials. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’12). ACM, New York, NY, USA, 337–346. https://doi.org/10.1145/2207676.2207723
- Palmer (2018) Danny Palmer. 2018. Amazon’s Alexa could be tricked into snooping on users, say security researchers. (April 2018). https://zd.net/2qXadSj
- Pew Research Center (2017) Pew Research Center. 2017. Nearly half of Americans use digital voice assistants, mostly on their smartphones. (December 2017). http://pewrsr.ch/2kquZ8H
- Roy et al. (2018a) Nirupam Roy, Haitham Hassanieh, and Romit Roy Choudhury. 2018a. BackDoor: Sounds that a microphone can record, but that humans can’t hear. GetMobile: Mobile Computing and Communications 21, 4 (2018), 25–29.
- Roy et al. (2018b) Nirupam Roy, Sheng Shen, Haitham Hassanieh, and Romit Roy Choudhury. 2018b. Inaudible Voice Commands: The Long-Range Attack and Defense. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, 547–560.
- Sivaraman et al. (2015) Vijay Sivaraman, Hassan Habibi Gharakheili, Arun Vishwanath, Roksana Boreli, and Olivier Mehani. 2015. Network-level security and privacy control for smart-home IoT devices. In Wireless and Mobile Computing, Networking and Communications (WiMob), 2015 IEEE 11th International Conference on. IEEE, 163–167.
- Smith et al. (1996) H. Jeff Smith, Sandra J. Milberg, and Sandra J. Burke. 1996. Information Privacy: Measuring Individuals’ Concerns about Organizational Practices. MIS Quarterly 20, 2 (1996), 167–196. http://www.jstor.org/stable/249477
- Zeng et al. (2017) Eric Zeng, Shrirang Mare, and Franziska Roesner. 2017. End user security & privacy concerns with smart homes. In Symposium on Usable Privacy and Security (SOUPS).
- Zhang et al. (2017) Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, and Wenyuan Xu. 2017. DolphinAttack: Inaudible voice commands. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 103–117.
- Zheng et al. (2018) Serena Zheng, Marshini Chetty, and Nick Feamster. 2018. User Perceptions of Privacy in Smart Homes. arXiv preprint arXiv:1802.08182 (2018).