"Robot Steganography"?: Opportunities and Challenges

08/02/2021 ∙ by Martin Cooney, et al. ∙ 0

Robots are being designed to communicate with people in various public and domestic venues in a helpful, discreet way. Here, we use a speculative approach to shine light on a new concept of robot steganography (RS), that a robot could seek to help vulnerable populations by discreetly warning of potential threats. We first identify some potentially useful scenarios for RS related to safety and security – concerns that are estimated to cost the world trillions of dollars each year – with a focus on two kinds of robots, an autonomous vehicle (AV) and a socially assistive humanoid robot (SAR). Next, we propose that existing, powerful, computer-based steganography (CS) approaches can be adopted with little effort in new contexts (SARs), while also pointing out potential benefits of human-like steganography (HS): although less efficient and robust than CS, HS represents a currently-unused form of RS that could also be used to avoid requiring computers or detection by more technically advanced adversaries. This analysis also introduces some unique challenges of RS that arise from message generation, indirect perception, and effects of perspective. For this, we explore some related theoretical and practical concerns for selecting carrier signals and generating messages, also making available some code and a video demo. Finally, we report on checking the current feasibility of the RS concept via a simplified user study, confirming that messages can be hidden in a robot's behaviors. The immediate implication is that RS could help to improve people's lives and mitigate some costly problems – suggesting the usefulness of further discussion, ideation, and consideration by designers.



There are no comments yet.


page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

At the crossroads between secure communications and human-robot interaction, this paper focuses on the emerging topic of ”robot steganography” (RS), the hiding of messages by a robot.

Steganography, or message hiding, is a crucial security technique that can complement other approaches. For example, encryption alone cannot prevent an adversary from detecting that a message is being sent.

Steganographic messages could also be sent by interactive robots, which are expected to play an increasingly useful role in the near future by conducting dull, dangerous, and dirty tasks, in a scalable, engaging, reliable, and perceptive way. Here “robot” is defined generally as an embedded computing system, comprising sensors and actuators that afford some semi-autonomous, intelligent, or human-like qualities; thus, systems are included that we might not normally think of as robots, such as autonomous vehicles, smart homes, and wearables (i.e., robots that we ride, live in, and wear). Such robots exhibit qualities conducive for steganography:

  • Generality. Since robots typically contain computers, existing computer-based steganography (CS) approaches can be used.

  • Multimodality. Robots can generate various signals, from motions to sounds, that provide opportunities to hide messages.

  • Opacity. Robots tend to be highly complex, such that most people do not understand how they work.

  • Nascency

    . Robots are not yet generally common in everyday human environments due to their current level of technological readiness, which could allow for occasional odd behavior to be overlooked (plausible deniability).

However, currently it is unclear how a robot can seek to accomplish good via steganography. Thus, the goal of the current paper is to explore this gap, using a speculative scenario-building approach–which seeks to provoke thought by constructing concrete ”memories” of a potential future reality via rapid ideation and discussion sessions [16].

The remainder of this paper is organized as follows: Section 2 discusses some related work, identifying gaps; Section 3.1 discusses two scenarios related to outdoor and indoor robots, indicating some unique challenges related to behavior generation and perception. Theoretical and practical concerns, for selecting carrier signals and generating messages, are explored in  3.2 and  3.3. This leads to a proof-of-concept implementation, that is checked via a user study, as reported in  4. Finally, the results are discussed in Section 5, along with ideas for next steps. Thereby, the aim is to stimulate thought about the possibilities for robots to help people in the near future.

2 Related Work

Recently, the importance of discreet communication for robots has been indicated, and steganography approaches have been developed that could be adapted for robots.

2.1 Robot discretion

Recently, researchers have proposed that robots that interact with humans should not always single-mindedly reveal truth, but will need to “lie” in various situations, to provide good service [17, 10]. For example, a robot asked by its owner about their weight might not wish to respond, ”Yes, you are very fat”.

Toward this goal, work has started to identify relevant behavioral approaches and concerns: Wagner et al. reported on applying interdependence theory, suggesting that stereotypes can be used to initially estimate the cost, value, and estimated success rate of lying [17]. Isaac and colleagues indicated the usefulness of theory of mind to allow robots to detect ulterior motives, to avoid manipulation by humans with bad intentions [10]. Additionally, Gonzalez-Billandon et al. developed a system to detect human lies based on eye movements, response times, and eloquence, also verifying that robots were lied to in a similar way as humans [8]. Such work has formed a basis for robots to interact more effectively via discreet communication.

We believe that for similar reasons, not just false utterances, but also an ability to send secret messages to the right recipient via steganography could be useful.

2.2 Traditional steganography

Here, current steganography is considered as comprising two broad categories: human and computer steganography.

2.2.1 Human steganography.

Humans have used steganography at least since the age of Ancient Greece to warn of threats and ask for help, using a variety of audiovisual signals [14]. For example, steganography has been used to indicate mistreatment in war by POWs who blinked in Morse code or used rude gestures that their captors did not recognize.111https://www.archives.gov/exhibits/eyewitness/html.php?section=8222https://www.damninteresting.com/the-seizing-of-the-pueblo/ Various signals have also been proposed for reporting domestic violence, such as by asking for “Angela” or a “Minotaur” at a bar, drawing a black dot on one’s hand with mascara333https://www.bbc.com/news/blogs-trending-34326137, using a red pen instead of a black pen at a clinic444https://www.lgbtqnation.com/2020/01/clinics-ingenious-way-help-domestic-violence-victims-sweeping-web/, or using a hand gesture555https://www.bodyandsoul.com.au/health/health-news/a-secret-hand-signal-showing-someone-is-a-victim-of-domestic-violence-is-going-viral/news-story/3b6ab5c0cd03052fba02d880312c7d3b. In general, such signals are also common in everyday media tropes, from gestures such as putting ”bunny ears” behind someone’s head when a photo is taken, to facial expressions behind someone’s back, or using a bird call to signal an ally without alerting adversaries.

One recent study has started to venture into this apparently mostly uncharted domain, examining how an underwater vehicle could mimic animal sounds [12]. Yet, such studies exploring how robots could conduct steganography in a human-like way (hereafter HS) appear to be strikingly rare–possibly due to the nascent state of robotic technology, as well as some salient advantages of computer-based steganography (CS).

2.2.2 Computer-based steganography (CS).

The development of computers led to new possibilities for highly efficient and robust steganography, which typically involves small changes being made to little-used, redundant parts of a digital carrier signal. For example, least significant bits (LSB), parity bits, or certain frequencies can be used, in a carrier such as digital text, visual media (image, video), audio (music, speech, sounds), or network communications (communicated frames/data packets) [18]. In particular, the application of the latter to robots has started to be explored; for example, de Fuentes and colleagues investigated steganography in Vehicular Ad hoc Networks (VANETs) [4]. However, there is a gap related to the ”big picture” of how robots in general could engage in steganography, which requires a combination of technical and design perspectives.

In our previous work, we have conducted a user study to explore the robustness of a proposed steganography method [11], and reported in a short paper on some initial ideas regarding vehicular steganography [2]. The novel contribution of the current paper, which extends the latter, is in using a speculative prototyping approach to explore the ”big picture” for RS, including useful scenarios and how the challenges that emerge could be overcome, as well as basic feasibility via a user study.

3 Methods

To explore the lay of the land, we adopted a speculative approach combining scenario-building and prototyping, to identify potential conceptual interactions from a design and technical perspective, and conduct a simplified user study. “Speculative design” is a simplified, fictional, problem-finding approach, intended to open a portal to see how things could be in an alternative future reality and thereby provoke thought and stimulate discussion [6]. A core tool in the speculative toolbox is the “scenario”, an open-ended story that gives us “memories of the future”–by communicating visions in a concrete, easily-understood, relatable, and interesting way [16]. However, imagined scenarios alone might miss capturing what could happen in the real world; for this, “prototyping” offers a way to avoid prohibitively time-consuming and expensive manufacturing of full systems, by balancing speed of investigation with accuracy of insights, in line with the maxim, “Fail often, fail fast, fail cheap” [7].

Thus, the notion of RS was explored first through rapid ideation sessions and scenario building. Discussion within the group raised a number of questions:

  • Big Picture. How might a robot help people by sending secret messages?

  • Carriers. What signals can be used to hide messages?

  • Signal Generation. What theoretical and practical concerns exist?

Furthermore, two main kinds of robot were identified as lenses to facilitate exploration of the questions: an autonomous vehicle (AV) and a socially assistive humanoid robot (SAR). The former is an outdoor robot with a focus on transport and movement, whereas the latter is an indoor robot with a focus on social communication (especially for healthcare); both offer exciting possibilities for improving quality of life in interacting persons. (Various other questions and kinds of robots might exist, and the ones presented here merely act as a basis for initial exploration and are by no means the only possible options.)

3.1 The ”Big Picture”

In line with the ”How Might We” design method, the first question identified was phrased as: How might a robot help people by sending secret messages?

To address the question, brainstorming ideas were recorded without judgement, then blended and grouped into short written narrative scenarios. In doing so, the aim was to capture a wide range of ideas in a small number of potentially high-value, plausible scenarios (feasibility from the perspective of current technology was not used as a filter, given the speculative approach). This resulted in a total of eight scenarios (four per category), visualized in Fig. 1. The scenarios were then analyzed, yielding insight into some core themes: the kinds of problems that would be useful to design solutions for, commonalities, venues, interactive roles, cues to detect, and actions a robot could take, as well as some unique challenges.

Figure 1: Scenarios: (a) AV. A possible threat is approaching a sensitive area, (b) SAR. Someone might require protection from an undesirable interaction

Two example scenarios are presented below:

AV. ”Hey!” KITTEN, the large truck AV, inadvertently exclaimed. ”Are you watching the road?” Its driver, Oscar, ignored KITTEN, speeding erratically down the crowded street near the old center of the city with its tourist area, market, station, and school, which were not on his regular route. KITTEN was worried about Oscar, who has increasingly been showing signs of radicalization–meeting with extremists such as Mallory–and instability, not listening to various warnings related to medicine non-adherence, depression, and sleep deprivation. But she wasn’t completely sure if Oscar was currently dangerous or impaired, as his driving was always on the aggressive side; and, KITTEN didn’t want to go to the police–if she were wrong, Oscar might lose his job. Or, even if she were right and the police didn’t believe her, Oscar could get angry and try to bypass her security feature, or find a different car altogether, and then there would be no way to help anymore. At the next intersection, KITTEN decided to use network steganography to send a quick ”orange” warning to nearby protective infrastructure comprising a monitoring system and anti-tire spikes that can be raised to prevent vehicles from crashing into crowds of pedestrians–while planning to execute an emergency brake and call for help if absolutely required.


. ”Howdy!” called Alice, the cleaning robot at the care center, as she entered Charlie’s room. Her voice trailed off as she took in the odd scene in front of her: she could see bruises on Charlie’s arms, who appeared agitated. The room was cold from an open window, which had probably been opened hours ago, and yesterday’s drinks had not been cleared away–there was no sign anything had been provided for breakfast. Closing the window, Alice noticed a spike of ”worry” in her emotion module, directed toward Charlie, whom she knew had a troubled relationship with Oliver, his main caregiver. The other day, Charlie had acted disruptively due to his late-stage dementia, to which Oliver had expressed frustration and threatened punishment; with his history of crime, substance abuse, unemployment, and mental health problems, this might not be merely an idle threat. But, there might be some explanation that Alice didn’t know about, and she didn’t have permission to begin with to contact authorities, since a false report could have highly negative consequences. Sending a digital message would also probably not be wise, since the matter was urgent, and Oliver and the rest of the group had access to her logs. When she headed over to the reception, there was Oliver talking to Bob. Alice wanted to let Bob know as soon as possible without alerting Oliver, so she surreptitiously waved to Bob behind Oliver’s back to get his attention and flashed a message on its display that she would like to ask him to discreetly check in on Charlie as soon as possible. Bob nodded imperceptibly, and Alice went back to cleaning–with Bob’s help, Alice was sure that Charlie would be okay.

The scenarios suggested that RS might be useful when three conditions hold:

  • Traffic Safety and Crime Prevention. There is a possibility of an accident or crime. Traffic accidents are globally the leading killer of people aged 5-29 years, with millions killed and injured annually666https://www.who.int/publications/i/item/9789241565684, and crimes are estimated to cost trillions each year [5].

  • Possible High Danger. There is a high probability of danger, but the robot is not completely sure about the threat, or has not been given the right to assess such a threat. (The consequences of making a mistake could be extremely harmful and we might not wish to place such power in the hands of a fallible robot.) Thus, the robot requires another opinion, possibly through escalation to a human-in-the-loop.

  • Plausible Detection and Reprisal. An adversary might detect an unconcealed message, leading to potential reprisals. (An adversary might not want to risk attention by deactivating a robot, or might think it is on their side, but if the robot is seen to interfere, the antagonist could try to shut it off, destroy it or abandon it, modify it to either not send messages or send false messages, or find information on the intended recipients; this might make it impossible to help the victim, or punish the victim more, scaling up the problem.).

(When these conditions do not hold, a different approach could be used. For example, in some cases, a robot could directly call for help without steganography–if it has witnessed a life-threatening situation, like a robot entering a pedestrian area at high velocity, or shots fired, and the threat is perfectly clear–or if the robot has a strong belief that the adversary could not detect a call for help.) A detailed comparison of scenarios for AVs and SARs, including settings, informative cues, and potential robot actions, is presented in Table 1.

Scenarios Four scenarios for the AV related to potential reckless driving (hit-and-run, drunk driving), trafficking (drugs, humans or other contraband), robbery (at a bank, store, or carjacking), and violent crime (homicide or abduction). Four scenarios for the SAR related to the potential abduction, abuse, or homicide of vulnerable populations such as elderly, children, persons with special needs (e.g., dementia, autism, blindness, motor impairment, depression), homeless, spouses, and member of some targeted group (e.g., whistleblowers, freedom fighters, persecuted minorities, or prisoners of war).
Summary An adversary (lone individual, small group, or representative of an oppressive nation) exhibits malevolent cues during transit to a sensitive area such as a border, bank, military site, or crowded or dangerous location. The main case involved an adversary travelling inside the AV, but could also include a remotely-controlled AV adversary in a platoon or witnessing an external adversary. The scenarios mostly involved an adversary that seeks private, sole access to the victim (e.g., via removal to a second location) to avoid abuse being witnessed. The intention could include domestic violence, kidnapping, bullying, assault, robbery, threat, murder, microaggressions/retribution, battery, rape, or other disturbances–within a setting such as a care center, school, family home, bar/night club, or some other transitional or secluded space such as a street or park. The robot could just happen to be in the area or be accompanying a person; such a robot could also be useful for healthcare, daily tasks like cleaning, home assistance, security, or delivery.
Table 1: Some fundamental concerns for RS with AVs and SARs.
Cues Cues could include in manual driving mode speeding, weaving, tailgating, and failing to yield or signal, and more generally, hiding packages; unhealthy behavior (medicine non-adherence with depression or sleep deprivation); and being armed and masked without occasion. Cues can include: (1) Sudden, unexplained, negative or odd changes in a potential victim’s state or behavior (possibly due to being drugged, impaired, or intimidated). This can include physical injuries such as bruises, emotion displays of pain, fear, or anger, or payments and subservience to others. As well, this could include worsened environment conditions (e.g., an elderly person’s room might be cold from the window having been left open for a long time, and no drinks have been brought to them). (2) High risk factors such as violent and unjustified behavior or emotional displays from a potential adversary, possibly in conjunction with a history of fighting, threats, crime, substance abuse, unemployment, high stress, and mental health problems, or a large perceived force imbalance (e.g., if a large, armed combatants from a notoriously dangerous group burst in, making violent and unjustified demands).
Actions Warnings could be sent to protective infrastructure (e.g. anti-tire spikes; V2I), border or bank security (V2H), or nearby AVs or platoon members (V2V). Here, the robot’s goal could be to avoid harm to anyone, by discreetly contacting people who might be willing and able to help without the adversary knowing, while preventing the victim from being taken away by possibly lying about the victim’s whereabouts, stalling, delaying, and evading. Warnings could be sent to family such as parents, security at a bar, teachers, or care center staff.
Table 2: Some fundamental concerns for RS with AVs and SARs.

(Along the way, ideas were also considered for two other kinds of robot: A smart home could warn of potential abductions of people held against their will, domestic violence, making/selling/using drugs, or prostitution; also, a wearable suit could warn of a concealed weapon or drugs or some potential crime committed to or by its wearer, such as assault or theft. These ideas seemed to be comprised already in the scenarios for AVs and SARs and therefore are not considered in the remainder of the paper.)

Additionally, the scenarios also suggested some ”unique” aspects to RS that differ from traditional steganography, as shown in Fig. 2:

  • Generation. Instead of humans coming up with messages to send over computers, an AV must itself generate a message from sensed information.

  • Indirection. In HS, files are not passed directly from sender to recipient, introducing risks of noise and lower data transmission rates.

  • Perspective

    . In HS, unlike e.g. video motion vector steganography, a robot could control its motion to generate an

    anisotropic message visible only by an intended viewer at some specified angle and distance; audio reception could also be controlled via ”sound from ultrasound” [15] or high frequency to send location- or age-specific sounds.

Figure 2: Unique challenges of robot steganography (RS): (1) message generation, (2) indirection, (3) perspective

3.2 Carriers

As the basic idea of CS is already described in the Section 2

, this section focuses on the new concept of HS and using ”physical” carriers: Visually, locomotion–e.g., variance over time in position and orientation (path, or trajectory), velocity, or acceleration–could be used to hide messages detectable via communicated GPS, videos, or odometry. Other signals could include lights, and opening or closing of windows and convertible tops. Aurally, speakers that generate engine noise

777https://www.core77.com/posts/79755/Cars-are-Now-So-Well-Built-Manufacturers-Pipe-In-Fake-Engine-Sounds-Listen-Here, or even music players or a horn could be used. More complex approaches could be multimodal, using a platoon, swarm of drones, or even the environment, like birds flying plus an AV’s motion, or use rare modalities like heat; delays, ordering, modality selection, and amplitudes could also be used.

Figure 3: Carriers: (blue) visual, (green) audio, (purple) multimodal, (brown) other

3.3 Signal Generation

In addition to carriers, how to generate message-bearing signals required consideration. Various work has looked at how robots can perceive signals and organize knowledge (e.g., based on Semantic Web languages like W3C Web Ontology Language (OWL) [1]), and it is known that information can be encoded in a message as bits or pulses using a code such as ASCII, Morse, or Polybius squares. What was unclear was how messages can be (1) structured and (2) incorporated in HS (e.g., for locomotion), also from (3) a practical perspective, as considered below:

(1) Message structuring. Assuming that a message consists of 1 to n short propositions , of varying time-critical importance (such as the nature of the emergency, location, names, etc.), we propose that message generation can be formulated as a greedily-solvable unbounded Knapsack problem


where indicates if a proposition will be included, is the time required to send (not necessarily simply related to message length), is maximum time available for transmission, and is an exponential or step function that increases rapidly at first and quickly levels off (e.g. ). Then, the probability that a proposition is successfully communicated could be calculated as , which is indirectly related to the number of times that a proposition is sent, where is the probability a given proposition will be ”lost” in noisy conditions.

(2) HS (Locomotion). Assuming a simplified scenario with sideways drifting to encode messages with Morse code, , the time available for transmitting messages can be calculated as , the distance from the front of the AV to its next interruption (e.g., the start of an intersection) divided by the velocity and multiplied by , the desired rate of message to non-message constituents of the carrier signal, related to encoding density (e.g.)


Given knowledge of what to send and when, we next calculate how a robot’s motions could be controlled. First, a motion generation model is required. Various such models exist; here we explore a simplified extension of the circular specification of the social force model, that can be trivially extended to provide a simple model for a robot’s motion [9, 3]. Nearby lanes/barriers, passing vehicles, and cordoned off road sections exert mainly a lateral force on the vehicle, whose forward speed is maintained within the speed limit; whereas close red light intersections and stopped cars in front exert a frontal force, as expressed by


where is a force acting on the robot to ensure it moves toward its goal in an appropriate way, maintaining a steady forward speed that obeys the speed limit, represents forces exerted by the nearest environment (lanes, obstacles, robots, traffic lights, and other infrastructure), and is a force working on the robot to communicate some hidden message. (Other terms could also be used for more complex modelling, such as a social filter to make the robot’s intentions clearer to human drivers, or some random noise when not transmitting meaningful messages intended to make it harder for adversaries to detect messaging (e.g., ”salt”: a sequence of random meaningless bits to be concatenated with the information bearing sequence). As well, PID controllers can be used for all force terms, to ensure that the robot moves correctly.)

In (4), the variable is the normalized 3D displacement vector from the robot’s center of mass to an obstacle’s center of mass ( is the magnitude of such a vector, or relative distance), is the “interaction force”, and is the “interaction length” a robot seeks to establish between itself and obstacles. At stops, the force could also be calculated so that the robot’s speed becomes zero rather than moving backwards, if , where is the angle between the lateral axis and the robot’s velocity vector.

Also, to cause a robot to drift slightly sideways, a similar equation could be used, by ”hallucinating” the presence of a ”virtual obstacle” in front and to the side of the robot. Thus, such a model, in conjunction with standard (Ackermann) kinematics and dynamics, could be used to calculate the parameters required for a robot to combine its required motion and steganographic motion.

(3) Practical concerns. Reality can also raise its ”ugly head” when designers seek to move from theory to practical applications. Table 3 summarizes practical constraints related to frequency, accuracy, and potential challenges.8881cm is our estimate based on a typical dashcam and 10m distance, and 1mm is reported in: https://www.manufacturingtomorrow.com/article/2018/01/industrial-robots-encoders-for-tool-center-point-accuracy/10867/

1 GPS 5-10Hz 5m (direct) 30-50 cm (DGPS), 2cm (RTK) [13] How to get data (CAM)?
2 Video 30 fps 1cm (relative), 1mm (indoor, markers) Weather
3 Audio onset 40Hz = 0.817 Noise
Table 3: Some practical considerations for potentially useful carriers.

The details for our estimate regarding relative positioning via camera are as follows. If it is imagined that one robot is watching another robot in front, how much would the front robot have to move laterally for the motion to be visible by the back robot? For example, if we assume a standard dashcam (the total resolution horizontally is 1920), with a viewing angle, , of 130 degrees, 10 m distance between robots , the smallest resolution of motion that can be detected for one pixel to change is approximately 1 cm of relative movement, as expressed by


Thus, as noted, various challenges exist–from accurate perception and control, to the numerous other channels could be utilized for HS, whose exploration lies outside of the scope of this speculative paper. Nonetheless, the rich AI and robotics literature appears to contain approaches that could be adapted to tackle robot signal generation, also for HS.

4 User Study

In the current paper, we proposed that messages can be embedded, both theoretically and practically, in signals such as a robot’s motions and sounds. To check the basic feasibility of this idea, a small user study was conducted based on implementing some simplified algorithms related to the identified scenarios. Specifically, the goal was to check that messages can be hidden in a robot’s behavior in a way that is not easy to detect.

4.1 Participants

20 faculty members and students at our university’s School of Information Technology participated in an online survey (40% were female, 50% male, and 10% preferred not to say; average age was 41.8 years with SD= 10.3; and six nationalities were represented, with Swedish by far the most common (60%)). Participants received no compensation. Ethical approval was not required for this study in accordance with the Swedish ethics review act of 2003 (SFS no 2003:460), but principles in the Declaration of Helsinki and General Data Protection Regulation (GDPR) were followed: e.g., in regard to written informed consent.

4.2 Procedure

Similar to a previous study on audio steganography [11], participants were sent links to a Google Forms survey, which took approximately three minutes to complete. Participants watched two pairs of videos of a robot moving and speaking. In one video of each pair, messages were hidden in the robot’s behaviors. After watching each pair of videos, participants noted which one they thought contained a hidden message. For simplicity, the Baxter robot was used, which is easily programmed to communicate audiovisually via movements, a face display, and speech utterances.

4.3 Conditions

There were two conditions, steganography (present or absent) and modality (visual or audio). Thus, four videos (two pairs) were prepared, as depicted in Fig.4.

Figure 4: Audiovisual steganography was applied to (a) gripper motions, facial display, arm motions, as well as (b) speech delays and WAV files.

The first two videos show the robot greeting an adversary who has suddenly appeared. (The videos are recorded from the adversary’s perspective.) The steganography video uses three visual carriers to send hidden messages to an off-camera observer to the left or front of the robot, intended to be difficult to perceive by an adversary to the robot’s right. The robot initially appears as if it is vacuum cleaning, while opening and closing a gripper to express ”SOS” through Morse code; the gripper’s motion and clicking sound are masked via its other arm–demonstrating the perspective property–and the vacuuming sound. Next, the robot raises its arm in greeting, during which arm angles specified as floats are recorded in a motion file and constantly published on Robot Operating System (ROS) channels; there, an ASCII message ”SOS” was embedded in the LSB. Also, the message ”SOS! Stacey needs your help!” was hidden in the image data used for the robot’s face via LSB/ASCII. Thus, the messages are conveyed via ”pure” HS (not requiring a computer to detect), HS closer to CS (a human-like motion, but requiring a computer), and CS. It wasn’t clear if participants might notice strangeness in the clicking of the gripper, vibrations due the concealed motion, or from roughness in the robot’s arm motion due to the noise added.

The second pair of videos shows a robot speaking to an adversary. The steganography video uses two audio carriers to hide messages. Speech utterance delays were used to hide the message ”SOS” in simplified Morse code (100ms dot, 200ms dash). The message is read by using the sounddevice library to detect volume, and a model of normal delays–exemplifying the indirection property. Some simple heuristics were used including hysteresis in volume for speech start and end, and a model for minimal length of a speech sound. Furthermore, the message ”SOS! Stacey needs your help!” was again hidden, in a WAV file the robot uses to say ”hello”, via LSB/ASCII. Thus, the signals are predominantly HS and CS, respectively. It wasn’t clear if participants might notice the delays, or roughness in the sounds.

4.4 Results

Participants guessed the file with the hidden messages with an accuracy of 55% and 50% (11 and 10 participants respectively; thus, responses were in general 52.5% correct). A binomial test confirmed that detection was not more common than can be expected by random chance (50%), for the results of the first pair of videos (), the second pair (), and all of the data together (). Also, of 13 participants who left comments, ten stated that they could not see obvious differences and guessed; also, three indicated cues such as nodding, color changes, pauses and blinking that could not be clearly related to changes expected due to embedding messages, possibly a result of pareidolia (a human tendency to see patterns even where none exist).

As a result, this simplified check confirmed that messages can be concealed in some common robot behaviors without humans suspecting. (While some messages require the recipient to have a computer, others, like the robot’s gripper communicating via Morse code, can be seen by the naked eye.)

5 Discussion

In summary, the contribution of the current work is proposing some theoretical and practical considerations for a robot to convey hidden messages to help people, which we have dubbed robot steganography (RS):

  • A speculative approach revealed

    • applications to traffic safety and crime prevention

    • three unique qualities of RS relating to message generation, indirection, and perspective

    • potential carriers, as well as initial ideas for signal generation, comprising message structuring, motion generation, and some practical constraints.

  • A simplified user study confirmed that messages can be hidden in various robot behaviors, also demonstrating a first example of robot steganography.

  • Additionally, a video and code have been made freely available to help guide others who might be interested in this topic.999youtu.be/vr3zlva6cCU101010github.com/martincooney/robot-steganography.

The immediate implication is that robot designers might wish to be aware of the idea of RS, which could save people’s lives and could be easily implemented in some contexts: Robots can already use established approaches for computer-based steganography (CS) to communicate with computers, robots, or humans equipped with computers. Moreover, human-like RS (HS) can also be used to communicate even with humans who do not have access to a computer (e.g., this can be as simple as merely gesturing or displaying a message behind someone’s back), or when technically capable adversaries might be aware of more common steganography approaches. Furthermore, as noted, RS can complement other techniques such as encryption or lying. For example, while sending messages, a robot could temporarily pretend to slow down or turn, to fool an adversary who wants to stop or guide it toward a threat.

5.1 Limitations and Future Work

This work represents only an initial investigation into a few topics related to RS and is limited by the speculative prototyping approach followed. Other important scenarios, carriers, and motion generation approaches might exist for other kinds of robots, and the simplified user study only checked the feasibility of RS in a prototype SAR, since a mistake in an AV could be lethal. As well, the usefulness of RS is not completely clear, as CS can result in lower transmission rates than merely encrypting: for HS, there can be further restrictions on speed and robustness, which might not be justified if intended recipients are expected to always be monitoring a computer or adversaries are expected to never be technically capable.

Future work will explore steganalysis (detection of hidden messages), ethics, and perception of dangerous scenarios:

  • Robot Steganalysis (RS2). How can threats be detected if recipients must look at massive amounts of data? Discrepancies from a ”normal” model might be undesirable, since they could be detected also by adversaries, but mimic functions and data whitening could complicate detection. One basic steganalysis approach involves comparing signals known to be good with suspect signals, assuming adversaries do not have such data; and, decentralized systems and federated detection could be useful.

    Another related concern is if adversaries can neutralize messaging (content threat removal) via ”jamming”?–For example, by shining light, occluding, or playing loud sounds. A countermeasure could be to use robust techniques (e.g., not LSB), large signals (e.g., large motions like turning could be easier to hide messages), and multimodal redundancy (if sound is jammed, maybe a visual message might still pass) and repetition of messages. Conversely, if HS is too easily detected, a robot could spoof being piloted by a human; e.g., it might be strange if an AV weaves around the road, but could be overlooked if it is believed the robot is being piloted by a tired, unskilled, or playful human driver.

    Steganalysis could also be used to detect infiltrated ”Byzantine” systems. For example, a robot in an insecure platoon could use a ”canary trap”, sending some watermarked messages to seek to detect an adversary: thus, watermarking steganography to improve authenticity. Conversely, where anonymity is a concern, a robot could use a Man in the Middle (MITM) approach to capture an adversary’s or other communications (like CAM), embedding a hidden message and propagating it, to make it harder for the antagonist to identify the robot as the sender of a hidden message.

  • Ethics. Future work must also examine the ethical challenges in RS and propose solutions. New technologies are increasingly becoming misused by nefarious agents such as terrorists and hackers–it is the age of Deepfakes and alternative truth in media, of trolls and botnets, of the Dark Web and malicious Stegware code. Here, adversaries could reprogram robots to send secret messages about potential targets. Can we trust a system that could betray us, or make mistakes?–We believe the answer could be yes, because we trust humans who wield similar power over us. Thus, also given the typical ”arm’s race” that takes place between defenders and adversaries, our aim is that such knowledge should not allowed to be solely in the possession of nefarious agents, to prevent “Black Mirror”-like future scenarios; however, it is clear that such potential problems should be carefully examined.

  • Perception. Perception of potentially dangerous cues, such as injuries or weapons, is non-trivial, especially if people’s lives and reputations might be at stake. This also relates to localization and a robot’s theory of mind, including inference of what an adversary or intended recipient can see or hear. Although extremely challenging, excellent inferential abilities from subtle cues that in humans are typically attributed to detectives or spies could also be useful.

By shining light on such questions that seems to have not yet received much attention, we aim to bring a fresh perspective to possibilities for robots to create a better, safer society.


We acknowledge the kind help of our colleagues who participated in scenario building or volunteered as participants to help check our prototype!


  • [1] B. Bruno, C. T. Recchiuto, I. Papadopoulos, A. Saffiotti, C. Koulouglioti, R. Menicatti, F. Mastrogiovanni, R. Zaccaria, and A. Sgorbissa (2019) Knowledge representation for culturally competent personal robots: requirements, design principles, implementation, and assessment. International Journal of Social Robotics 11 (3), pp. 515–538. Cited by: §3.3.
  • [2] M. Cooney, E. Järpe, and A. Vinel (2021) “Vehicular steganography”?: opportunities and challenges. In Short extended abstract submitted to the Nets4Cars workshop, Cited by: §2.2.2.
  • [3] M. Cooney, F. Zanlungo, S. Nishio, and H. Ishiguro (2012) Designing a flying humanoid robot (fhr): effects of flight on interactive communication. In 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication, pp. 364–371. Cited by: §3.3.
  • [4] J. M. de Fuentes, J. Blasco, A. I. González-Tablas, and L. González-Manzano (2014) Applying information hiding in vanets to covertly report misbehaving vehicles. International Journal of Distributed Sensor Networks 10 (2), pp. 120626. Cited by: §2.2.2.
  • [5] M. DeLisi (2016) Measuring the cost of crime. The handbook of measurement issues in criminology and criminal justice, pp. 416–33. Cited by: 1st item.
  • [6] A. Dunne and F. Raby (2013) Speculative everything: design, fiction, and social dreaming. MIT press. Cited by: §3.
  • [7] D. Engelberg and A. Seffah (2002) A framework for rapid mid-fidelity prototyping of web sites. In IFIP World Computer Congress, TC 13, pp. 203–215. Cited by: §3.
  • [8] J. Gonzalez-Billandon, A. M. Aroyo, A. Tonelli, D. Pasquali, A. Sciutti, M. Gori, G. Sandini, and F. Rea (2019)

    Can a robot catch you lying? a machine learning system to detect lies during interactions

    Frontiers in Robotics and AI 6, pp. 64. External Links: Link, Document, ISSN 2296-9144 Cited by: §2.1.
  • [9] D. Helbing and A. Johansson (2013) Pedestrian, crowd, and evacuation dynamics. arXiv preprint arXiv:1309.1609. Cited by: §3.3.
  • [10] A. M. Isaac and W. Bridewell (2017) Why robots need to deceive (and how). Robot ethics 2, pp. 157–172. Cited by: §2.1, §2.1.
  • [11] E. Järpe and M. Weckstén (2021) Velody 2—resilient high-capacity midi steganography for organ and harpsichord music. Applied Sciences 11 (1), pp. 39. Cited by: §2.2.2, §4.2.
  • [12] J. Jia-jia, W. Xian-quan, D. Fa-jie, F. Xiao, Y. Han, and H. Bo (2018) Bio-inspired steganography for secure underwater acoustic communications. IEEE Communications Magazine 56 (10), pp. 156–162. Cited by: §2.2.1.
  • [13] M. Perez-Ruiz, D. C. Slaughter, C. Gliever, and S. K. Upadhyaya (2012) Tractor-based real-time kinematic-global positioning system (rtk-gps) guidance system for geospatial mapping of row crop transplant. Biosystems engineering 111 (1), pp. 64–71. Cited by: Table 3.
  • [14] F. A. Petitcolas, R. J. Anderson, and M. G. Kuhn (1999) Information hiding-a survey. Proceedings of the IEEE 87 (7), pp. 1062–1078. Cited by: §2.2.1.
  • [15] F. J. Pompei (2002) Sound from ultrasound: the parametric array as an audible sound source. Ph.D. Thesis, Massachusetts Institute of Technology. Cited by: 3rd item.
  • [16] L. B. Rasmussen (2005) The narrative aspect of scenario building-how story telling may give people a memory of the future. AI & society 19 (3), pp. 229–249. Cited by: §1, §3.
  • [17] A. R. Wagner (2016) Lies and deception: robots that use falsehood as a social strategy. Robots that talk and listen: Technology and social impact. De Grutyer https://doi. org/10.1515/9781614514404. Cited by: §2.1, §2.1.
  • [18] E. Zielińska, W. Mazurczyk, and K. Szczypiorski (2014) Trends in steganography. Communications of the ACM 57 (3), pp. 86–95. Cited by: §2.2.2.