Service robots are robots that operate semi- or fully autonomously to perform services useful to the well-being of humans and equipments. International Federation of Robotics (IFR) predicts that 32 million service robots are to be deployed between 2018-2022 . Some popular service robot applications include elderly care, house cleaning, cooking, patrol robots, robot receptionists, entertainment, and education. A few famous examples of service robots are Roomba by iRobot, Pepper by Softbank Robotics, “the robotic chef” by Moley Robotics, and Spotmini and Atlas by Boston Dynamics.
Different from industrial robots, service robots need to interact and cooperate with people safely under dynamic unstructured environments. Therefore, two key requirements for service robot operations are (1) accurate, general visual perceptions and (2) intelligent, dynamic, human compliant robot controls.
With recent breakthroughs in deep neural networks and robotic learning, robot visual perceptions and intelligent controls  under unstructured environments have become readily available . However, these learning-based technologies come with a high computation cost, and it is hard to deploy them directly on native robot controllers that have limited computation power. In our previous work on gesture based semaphore mirroring using a humanoid robot , we addressed this problem by moving deep-learning-based gesture inferencing into the cloud.
However, scalable cloud robotics systems are associated with high network communication costs, in the form of privacy, security, bandwidth, latency, and variability. Specifically, network latencies and variabilities, bounded by speed-of-light and inconsistent network routings, prevent cloud-based robotic controller from controlling dynamic robots directly for interactive, human-compliant robot tasks, especially those requiring visual feedbacks.
In this work, we combine both powerful cloud services and agile edge devices to build an intelligent Fog Robotic control system. Our Fog Robotic system is implemented under Human Augmented Robotic Intelligence Platform, or HARI , provided by Cloudminds Inc. We integrate this system with a dynamic, dual arm, dual leg, self-balancing robot, named Igor, made by HEBI robotics.
Leveraging a “heartbeat” communication protocol between cloud and edge, we are able to teleoperate Igor robustly using HARI. We choose to perform a robotic task that is commonly performed in warehouse logistics, namely box pickups from a human carrier.
While it is intuitive to teleoperate Igor during navigation, we found that it was extremely difficult and inefficient to teleoperate Igor to pick up objects as simple as a box. To make the cloud-based teleoperation more intuitive, we program a dynamic automatic box-pickup module based on visual detection of an apriltag ,. Apriltage detections are performed in the cloud to emulate how a cloud-based deep learning object recognition system would affect the performance of this Fog Robotic visual feedback loop. Furthermore, to avoid performing time-consuming camera calibration and registration whenever the robot performs a box pickup, we choose to implement the module using 2D Image Based Visual Servoing (IBVS). With the Fog Robotic IBVS, we demonstrate that Igor can perform reliable, automatic box pickups from a human carrier under unstructured environments.
The main contributions of this work are as follows:
A “heartbeat” protocol that enables robust teleoperation of a dynamic robot in Fog Robotics.
A Fog Robotic visual servoing module that enables automatic box-pickups to assist cloud teleoperators
Automatic box-pickups from a human to demonstrate dynamic human robot interaction (HRI) under unstructured service environments.
Iii Related Work
Cloud Robotics encompasses any robot or automation system that relies on either data or code from a network to support its operation . The term was introduced by James Kuffner in 2010. It was evolved from Networked Robotics . Well-known Cloud Robotic Systems includes: RoboEarth’s Rapyuta , motion planning for services at both cloud  and edge , Berkeley robotics and automation as a service (Brass) , and Dex-Net as a Service (DNaaS) , just to name a few. However, network costs in the form of privacy, security latency, bandwidth, and reliability present a challenge in Cloud Robotics .
Fog Robotics was recently introduced by Goldberg et al, and is defined as an extension of Cloud Robotics that balances storage, compute and networking resources between the Cloud and the Edge . It is inspired by Fog Computing, originally introduced by Cisco Systems in 2012 . In Fog Robotics, cloud computing resources is brought closer to the robot so that learning can be done close to where data is created. It has found its applications in service robots where robot can learn surface grasps from nearby unstructured environments . In this work, we use Fog Robotics for (1) cloud-based teleoperation of a dynamic robot; (2) host vision servoing server to provide robot object recognition and localization feedbacks under unstructured environments.
Robotic Vision for service robots to operate under unstructured environment is challenging if traditional industrial robotic approaches were used, because these methods were developed for precision under highly structured manufacturing environments. Registrations  and calibration  are often done before a robotic task, and they can be time consuming and do require additional expertise for system maintaining.
Image Based Visual Servoings (IBVS) ,  uses camera 2D image space to measure relative distances between the robot and the target. The measurements are independent of the exact 3D locations of the robot and the target. Therefore, camera registrations and calibrations are not required before each robotic task, desirable for service robots deployments under unstructured environments for human robot interactions.
Furthermore, recent development in deep-learning-based vision systems allow recognition , object detection , segmentation , and human gesture recognition  to be performed for unstructured environments in semi-real-time (5 - 10Hz) on a GPU server. Previously, we successfully implemented a cloud-based gesture perception system for a humanoid robot gesture mirroring task . More advanced robotic learning systems based on visual feedbacks has been developed for grasping , , visual servoing , guided policy search , , visual foresight 
, and domain randomization for transfer learning from simulation to reality, . All of these can be deployed in Fog Robotics to provide visual feed-backs for large scale service robot deployments.
Iv Self-Balancing Robot Igor
We use a 14 degrees of freedom (DoF), dual-arm, dual-leg, dynamic self-balancing robot named Igor (shown in Fig.3). It is designed and made by HEBI robotics. Each DoF is built with a self-contained, series elastic X-series servo modules. These servos can be controlled with position, velocity, and torque commands simultaneously, and can provide accurate measurements of these three quantities to a central computer at high speed ( 1KHz) with minimum latency. These modules are connected with Ethernet to an on-board Intel Nuc computer in the metal control box for self-balancing control.
The self-balancing is achieved by modeling the system as an inverted pendulum (see Fig. 3
bottom). To estimate the center of mass (CoM) of the robot, the CoM of the two arms and two legs are first measured through HEBI’s API in real-time using forward kinematics. The position of the total CoM is then estimated as the average of the CoMs of the four extremities plus the CoM of the control box weighted by the mass distribution:
Igor also uses accelerometer measurements from the four servo modules attached to the control box to estimate the direction of gravity () at all times. With CoM of Igor, center of wheels (), and direction of gravity, we can calculate the length and direction of the inverted pendulum:
The lean angle (), which is the angle between gravity and the inverted pendulum can then be estimated in real-time:
To keep the robot balancing, assuming that the lean angle () is small so that we can linearize the system:
a torque () in the direction of falling is applied to the wheel with radius () and angular velocity () to counteract the effects of gravity on the robot’s center of mass:
where is the velocity of the robot’s CoM. Furthermore, the derivative of the lean angle () can be controlled by a proportional controller with coefficient (), and is related to the velocity of the robot as follows:
The real-time measurements of both robot CoM and direction of gravity are important, because the Igor controller needs to compensate for dynamic movements of the four extremities for robust self-balance control. We can also rotate the robot by varying velocity applied to the two wheels, and control the angle of rotation based on inertia measurement unit (IMU) readings in real-time.
V An Intelligent Fog Robotic Controller
V-a Edge Controllers
There are two edge controllers in our system. The first one is an Intel Nuc computer in the Igor control box. It collects all sensor information from the 14 modular servos, and it controls all servos in real-time. It hosts high-speed feedback control loops (200Hz or above) to maintain robot posture and self-balancing. We refer to it as the low-level controller in Fig. 1.
The other edge device is the robot command unit (RCU). It is a smart android phone with a private LTE connection and a 2D camera (Fig. 1). RCU serves as the gateway between the high-level cloud robotic platform and the low-level self-balancing controller. It uses the private LTE connection to stream live videos to the cloud, and it receives and forwards high-level intelligent controls from the cloud to the low-level controller with minimum delay. RCU works both indoors and outdoors with a good LTE reception.
Furthermore, we are in the process of integrating HEBI’s self-balancing controller into the smart phone. It can replace the native Intel Nuc computer so that it will serve as both RCU and low-level controller. By making the edge controller more compact, we gain more battery room in the control box for a longer robot operation time.
V-B Cloud Controller
A high-level intelligent robot controller is placed in the cloud to work with the edge controller (see Fig. 1
). It operates at a lower speed (3-5Hz), yet it commands the robot based on HARI’s Artificial Intelligence (AI) and Human Intelligence (HI) services, which is critical for robots to operate under unstructured environments. Depending on the situation, it can either extract commands based on the object recognition server or forward commands sent from a cloud teleoperator. These high-level commands are sent to RCU and are executed in a different form on the low-level controller.
V-C Hybrid Control with “Heartbeat”
When controlling Igor, commands sent from the high-level cloud controller act as perturbations to a time-invariant, stable system maintained by the low-level self-balancing controller at the edge. This is a form of hybrid control where discrete signals are sent from the cloud to control a dynamical system at the edge.
Minimum delays in the cloud to edge robot command delivery are desirable for intuitive teleoperation. We choose to use the asynchronous network protocol UDP to implement the cloud-edge communication. However, since deliveries are not guaranteed in UDP, packages can be lost during communication. Further, the packages can arrive at the designation in different orders from their original sequence. Both problems can create variable controls at the edge controller, which can cause instability in self-balancing. This would affect user experiences during teleoperation as well, and can be dangerous to people around the robot.
We implement a “heartbeat” signal at the edge controller to solve these problems (shown in Fig. 4). The “heartbeat” is a switch signal that is turned on when the first signal arrives at the edge. It will remain on for a period of time (t) and will only turn off if there is no package received for the selected command during this time. We can view the “heartbeat” design as performing a “convolution” with a moving window on the signal received. Finally, we turn the “heartbeat” signal into an edge control signal with a ramping function at the beginning and the end of the control to ensure a smooth start and end action when the controlled perturbation hits the stable self-balancing system.
Vi Dynamic Visual Servoing
To assist teleoperation with automation, we focus on using Fog Robotics to control a dynamic robot with Image Based Visual Servoing (IBVS) to automatically pick up a box. We choose IBVS because it eliminates extensive camera calibration that is hard to maintain on a dynamic robotic system. The goal of our IBVS is to navigate the robot to an optimal box pickup location where the apriltag lay within the green target box, which has the same size as the apriltag (Fig. 5)
The aim of visual-servoing-based control is to minimize the relative error between the measured target position and desired target position :
where is a set of image measurements and is a set of parameters, such as camera intrinsics, that represents additional knowledge about the system. is the measured values of image features/object locations, such as pixel coordinates in the picture frame, and is the desired values of image features/object locations.
The change of feature error and camera velocity is related by interactive matrix :
For IBVS, which is done in 2D image space, 3D points are projected onto 2-D images with coordinates :
which creates an interactive matrix for 2D image based servoing:
With the interactive matrix, camera velocity can be estimated by:
where is the Moore-Penrose pseudo-inverse of :
The final control law is set as a robot velocity effort opposite to the camera velocity because the target moves in the opposite direction of the camera in the image frame:
Notice that the interactive matrix depends only on and , that is the 2D pixel coordinate of the target, and which is the depth of the target. In our system, is measured as the size of the apriltag. Therefore, the IBVS measurement is independent of the exact 3D position of the target measurement, which is attractive to our system because the exact 3D camera registration is not required.
Vi-a IBVS Implementations for Automatic Box Pickup
The automatic visual servoing controller executes a box pickup in three phases. Phase 1 uses IBVS to move the robot to a position where the aprialtag has the same size as the green box shown in left side of Fig 5. The robot also need to position apriltag on the center purple line of the video frame after phase one, but not at the center. In phase 2, the robot adjusts its own height by changing the joint angles of the two “knee” joints so that the apriltag would lay at the center of the video frame where the green box is. After the robot reaches the optimal picking position when apriltag is at the center of the video, phase 3 begins. The robot controller commits to perform a box-pickup with a pre-defined, hard-coded, dual arm, grasping motion.
The size of the target green box encodes depth information () of the target. We find this optimal size by teleoperating the robot to different positions that is close to the box. From locations, we select the position that has the highest box-pickup successful rate using the hard-coded grasping motion.
Vi-B Fog Robotic IBVS
Although we use simple apriltags for object recognition, we aim to anticipate the design of a deep-learning-based Fog Robotic visual system for robotic pickups. Therefore, to emulate the latency effects under such system, we deploy apriltag recognition in the cloud and use “heartbeat” protocol to stream apriltag’s geometric locations to low-level controller via RCU. Together, we build a robust Fog Robotic IBVS controller for box pickups.
Vii Experiments and Results
With the “heartbeat” design, we are able to navigate the self-balancing robot reliably (see video) from the cloud-based teleoperation interface. We also pre-program arm actions such as clapping, and “disco” for human robot communications and entertainment (video: https://youtu.be/1H1VEpkbG_E).
We further hardcode a box pickup motion for the two arms, and attempt to pick up a box via the cloud-based teleoperation. However, even with a reliable teleoperation module and with pre-programmed pick up motions, we find it extremely difficult and inefficient to pick up a box using cloud teleoperation. We suspect that it is caused by the lack of natural, immersive 3D visual perception for the teleoperator,
To quantify the observation, we perform two different teleoperation experiments with 10 trials each: 1) control Igor locally so that the operator can see the robot and the box; 2) control Igor from the cloud to pick up the box. In both cases, the box is positioned on the table in a stable location. The robot is about 2 meters from the object and faces the front of the box (see video https://youtu.be/b0mr5GHHjBg). We observe that local teleoperator (at least 2 meters away from the target) can perform box pickups much faster with a higher success rate than the cloud teleoperator (see table I)
After implementing the automatic IBVS module, we perform the same experiments and benchmark the automatic module with human in the loop. We allow the cloud operator to first teleoperate the robot as fast as possible to a location where apriltag is recognizable, which can be up to 20 degrees from the surface normal direction of the apriltag. Then, the human triggers the Fog Robotic IBVS module, allowing the robot to pick up the box automatically. We observe that the speed of this third case is on-par with human local teleoperation, but the reliability is even higher at 100% (see table I)
|Average Duration (s)||Success Rate|
Finally, we leverage the flexibility of visual servoing to perform a proximate human robot interaction (HRI) task. During the task, the human carries the box with an apriltag. They can show Igor the apriltag while moving around. Igor can recognize and localize the tag with a distance as far as 6 meters. As soon as the robot recognizes the apriltag, the teleoperator can release the robot so that it enters automatic mode. We observe that as long as the robot can recognize the apriltag and the box is in a reachable height for the robot, the robot will follow the human around, and eventually pick up the box from the person with a high successful rate (see video: https://youtu.be/K9R4y2w1uPw)
Viii Discussion and Future Work
In this work, we take advantage of both intelligent cloud robotic platform and edge controller to build an intelligent Fog Robotic system that can perform human-compliant, automatic box pickups using visual servoing.
A “heartbeat” protocol with asynchronous communication is introduced to mitigate network latencies and variabilities effects on the dynamic hybrid self-balancing controller in Fog Robotics. However, the current “heartbeat” protocol is not perfect. There is an increased delay at the end of command signal that would cause a delayed reaction after the last command signal is received (see red arrow in Fig. 4). This imposes a significant safety concern, because even if the “heartbeat” time window is short, i.e. 250 ms in our case, the robot will not stop completely until after 250 ms plus the ramping down period. To compensate, we implement a sharper ramp function at the stop compared to the ramp function at the start, but 250 ms is the hard limit for the delay on the current system (shown as red arrow in Fig. 4).
Our future work includes a better modeling of package drops in asynchronous communication, so that we can build a probability model to measure variabilities of time intervals between packages. This way, we can further reduce this delayed reaction by adjusting the “heartbeat” window size based on the predicted time of last package.
Like other service robots, Igor needs to interact and cooperate with human beings. We demonstrate the advantages of visual servoing: (1) it requires no calibrations before each robotic task; (2) it can handle dynamic human robot interaction, such as following a human to pick up a box from that person. Our automatic system works even in unstructured environments when obstacles are present between the human and the robot. The self-balancing control and the compliant servos can correct themselves when small obstacles are encountered. The human box carrier or the cloud teleoperator can also help the robot avoid obstacles by guiding it to a path with more clearance.
One failure case is when the human carrier tricks the robot. It happens if the target is moved after the robot commits to the final phase of box picking, which is hard-coded.
In the future, we can program a more dynamic automatic object pickup so that the robot can pick up a moving object with a continuous motion, without hard-codings. We also plan to deploy deep-learning recognition pipelines such as mask-RCNN (cite) together with intelligent grasping systems such as dex-net  using Fog Robotic systems, so that it can guide both dynamic robots such as Igor and static robots such as YuMi and HSR  to perform generalized, human compliant object pickups and manipulations.
We thank members from the AUTOLAB– Ron Berenstein, Ajay Tanwani–for discussions, and members at Cloudminds–Arvin Zhang, Havelet Zhang, Mel Zhao–for their technical support. Special thanks Prof. Joseph Gonzalez for discussions on hybrid synchronous and asynchronous Systems. We thank Matthew Tesch, David Rollinson, Curtis Layton, and Prof. Howie Choset from HEBI robotics for continuous support, training, and discussion on the Igor self-balancing research robot.
Fundings from Office of Naval Research, NSF EPCN, and Cloudminds Inc. are acknowledged. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Sponsors.
-  I. F. of Robotics, Introduction into Service Robots, 2016 (accessed September 15, 2018). [Online]. Available: https://ifr.org/img/office/Service_Robots_2016_Chapter_1_2.pdf
-  ——, Executive Summary World Robotics 2017 Service Robots, 2017 (accessed September 15, 2018). [Online]. Available: https://ifr.org/downloads/press/Executive_Summary_WR_Service_Robots_2017.pdf
R. Girshick, “Fast r-cnn,” in
Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
-  https://github.com/matterport/Mask_RCNN, 2017.
-  Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in CVPR, 2017.
-  J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, “Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics,” arXiv preprint arXiv:1703.09312, 2017.
-  S. Levine and P. Abbeel, “Learning neural network policies with guided policy search under unknown dynamics,” in Advances in Neural Information Processing Systems, 2014, pp. 1071–1079.
-  S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” arXiv preprint arXiv:1603.02199, 2016.
-  C. Finn and S. Levine, “Deep visual foresight for planning robot motion,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 2017, pp. 2786–2793.
-  A. X. Lee, S. Levine, and P. Abbeel, “Learning visual servoing with deep features and fitted q-iteration,” arXiv preprint arXiv:1703.11000, 2017.
-  N. Tian, B. Kuo, X. Ren, M. Yu, R. Zhang, B. Huang, K. Goldberg, and S. Sojoudi, “A cloud-based robust semaphore mirroring system for social robots,” learning, vol. 12, p. 14.
-  E. Olson, “AprilTag: A robust and flexible visual fiducial system,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, May 2011, pp. 3400–3407.
-  J. Wang and E. Olson, “AprilTag 2: Efficient and robust fiducial detection,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2016.
-  B. Kehoe, S. Patil, P. Abbeel, and K. Goldberg, “A survey of research on cloud robotics and automation,” IEEE Transactions on Automation Science and Engineering, vol. 12, no. 2, pp. 398–409, 2015.
-  J. J. Kuffner et al., “Cloud-enabled robots,” in IEEE-RAS international conference on humanoid robotics, Nashville, TN, 2010.
-  G. Mohanarajah, D. Hunziker, R. D’Andrea, and M. Waibel, “Rapyuta: A cloud robotics platform,” IEEE Transactions on Automation Science and Engineering, vol. 12, no. 2, pp. 481–493, 2015.
-  A. Vick, V. Vonásek, R. Pěnička, and J. Krüger, “Robot control as a service—towards cloud-based motion planning and control for industrial robots,” in Robot Motion and Control (RoMoCo), 2015 10th International Workshop on. IEEE, 2015, pp. 33–39.
-  J. P. Jeffrey Ichnowski and R. Alterovitz, “Cloud based motion plan computation for power constrained robots,” in 2016 Workshop on the Algorithmic Foundations of Robotics. WAFR, 2016, .
-  N. Tian, M. Matl, J. Mahler, Y. X. Zhou, S. Staszak, C. Correa, S. Zheng, Q. Li, R. Zhang, and K. Goldberg, “A cloud robot system using the dexterity network and berkeley robotics and automation as a service (brass),” in Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 2017, pp. 1615–1622.
-  P. Li, B. DeRose, J. Mahler, J. A. Ojea, A. K. Tanwani, and K. Goldberg, “Dex-net as a service (dnaas): A cloud-based robust robot grasp planning system.”
-  J. K. K. G. Ajay Kumar Tanwani, Nitesh Mor, “A fog robotics architecture for distributed learning of surface decluttering,” Robotics and Automation Letters (under review).
-  F. Bonomi, R. Milito, J. Zhu, and S. Addepalli, “Fog computing and its role in the internet of things,” in Proceedings of the first edition of the MCC workshop on Mobile cloud computing. ACM, 2012, pp. 13–16.
-  P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611. International Society for Optics and Photonics, 1992, pp. 586–607.
-  R. Tsai, “A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses,” IEEE Journal on Robotics and Automation, vol. 3, no. 4, pp. 323–344, 1987.
-  S. Hutchinson, G. D. Hager, and P. I. Corke, “A tutorial on visual servo control,” IEEE transactions on robotics and automation, vol. 12, no. 5, pp. 651–670, 1996.
-  F. Chaumette and S. Hutchinson, “Visual servo control. i. basic approaches,” IEEE Robotics & Automation Magazine, vol. 13, no. 4, pp. 82–90, 2006.
-  J. Mahler, F. T. Pokorny, B. Hou, M. Roderick, M. Laskey, M. Aubry, K. Kohlhoff, T. Kroger, J. Kuffner, and K. Goldberg, “Dex-net 1.0: A cloud-based network of 3d objects for robust grasp planning using a multi-armed bandit model with correlated rewards,” in Proc. IEEE Int. Conference on Robotics and Automation (ICRA), 2016.
-  J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on. IEEE, 2017, pp. 23–30.
-  OpenAI, :, M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba, “Learning Dexterous In-Hand Manipulation,” ArXiv e-prints, Aug. 2018.
-  M. Laskey, J. Lee, R. Fox, A. Dragan, and K. Goldberg, “Dart: Noise injection for robust imitation learning,” arXiv preprint arXiv:1703.09327, 2017.