I Introduction
The training and maintenance of a traditional guide dog presents challenges to the elderly, frail, and visually-impared. Each guide dog has to be trained individually in a time and labor intensive process and the skills gained from one dog cannot be implemented into another dog. In addition, guide dogs may get ill or need to retire, which creates a hassle of getting a replacement dog, which may not be a good match for the user [1]. An autonomous robot that could lead people in need of assistance through a multi-floor building would ease the burdens that come with a traditional guide dog. Most previous robotic guides are bunglesome and are limited to maneuvering in narrow and complex spaces due to their bulky size or rely on physical interaction between the robot and the user, by having them physically hold a leash or rigid arm, without any way for the user to verbally give commands such as to reroute, or stop the robot [2]–[4]. In addition, none of these guide robots are able to guide and navigate in multifloor situations. In early 2021, Xiao et al. successfully implemented a robotic quadrupedal robot to guide a subject, however the model relied solely on physical interaction based around a leash and had no way for the person being led to directly communicate to the robot [5]. A small, quadrupedal robot that is both able to directly communicate and listen for commands from the person that is being guided as well as having a leash would solve such issues. We seek to accomplish this by utilizing a Unitree A1 quadrupedal robot [6] to autonomously navigate a visually-impared person in a multi-floor environment by creating algorithms that would allow for a custom wake-up word and communicate with the user via text-to-speech (TTS) and speech-to-text (STT) cloud services.
Ii Methodology
The robot would be able to vocally communicate with and understand the user using text-to-speech and speech-to-text algorithms. We had to first find basic open source code
[7] that allowed for the integration of Amazon Polly, a cloud service, that allows the robot to speak to the user directly by sending a string of text to Amazon Web Services, which submits that text to Amazon Polly to generate an audio stream. That audio stream is then retrieved from Amazon Polly which is then played through an installed speakerphone on the robot. We then had to make the code compatible with the robot’s infrastructure, which relies on Robot Operating System (ROS). For the robot to understand what the user is saying, we are using Google Cloud and their Speech-to-Text Application Programming Interface (API). Google Speech-to-Text API works by getting audio data from a source, which then runs the audio to convert into a digital line of text. In order to utilize this API, we found open source code from GitHub that is compatible with ROS and configured into the robot’s infrustature [8]. We gain audio data from the speakerphone on the robot for use with Google Cloud. That string of text is then returned to the STT algorithm, which will look to see if the wake word, which is customizable, has been said. If not, the algorithm ignores whatever was said and will resume to listen. When the wake-up word is said, the string is sent to a word dictionary function that will search for keywords in the resulting text and has preset coordinates based on those keywords. The algorithm then publishes those coordinates to the navigation goal node after understanding where the user wants to go. STT will also publish a string of text to TTS to allow for the robot to respond back to the user. The robot’s navigation subscribes to that STT publisher and creates a path to the target point.Iii Results
We tested the speech interface in simulation using a simulated navigation map that the robot would map out using its onboard lidar camera shown in Figure 1. In this simulation, the user said to the robot, ”Hey A1, take me to the lab.” The speech interface successfully heard the user’s command and translated the user’s command into a string of text. It then published the pre-set coordinates of the laboratory from the dictionary to the navigation goal node. The robot’s navigation was able to subscribe to that node and created a path to that goal location shown in Figure 2. Finally the robot responded back to the user saying, ”Okay, navigating to the lab.” The user then said, ”Take me to the office.” The speech interface successfully ignored the speech even though it could be a command as the user did not use the wake-up word, which was set to, ”Hey A1.” The robot’s navigation was not affected and no response back was given. It was only when the user said the same sentence but with the wake-up word that the algorithm recognized it as a valid command. This meant that the speech interface sent the coordinates of the office to the robot’s navigation pipeline, which resulted in creating a new path shown in Figure 3.



Iv Discussion and Future Work
These results prove that the TTS and STT engines were able to be integrated with the algorithm created. The algorithm was able to communicate to both the engines and the robot’s infrastructure. The robot ignored all irrelevant speech, only sending the string of text to the dictionary function when the wake-up word was said. Unlike previous robots with a speech interface, we are able to have a custom wake-up word and don’t rely on an Amazon Echo device [2]. We were able to successfully set a navigation goal solely by verbally communicating a command to the robot. Our previous work, while having a leash, relied on an external computer to input commands, not allowing the user themselves to communicate with the robot [5]. This work improves the user experience by allowing for explicit interaction, not just implicit interaction by the use of a leash.
We currently are further developing the guide dog robot to operate an elevator to allow for multi-floor navigation. In order to facilitate multi-floor navigation, we are currently restructuring the robot’s navigation to take floors into consideration. Having a multi-floor situation means that we need to further develop the speech interface to send coordinates that can relate to what floor level the navigation goal is at. The speech interface will be developed to allow for more commands such as telling the robot to stop at its current position as well as giving the user instructions when needed. We need to further optimize the speech interface such as making it easier to input new commands and new locations into the algorithm.
V Conclusion
Having a speech interface makes it simpler for the user to send commands to the robot. We developed and tested a successful speech interface algorithm that is able to communicate with the TTS and STT engines as well as communicate with the robot’s navigation pipeline. The main advantages of this work are that we are able to customize the wake-up word due to having our own proprietary speech interface and are able to create custom commands fairly easily by adding them to the word dictionary. We are able to have a custom wake-up word and are able to integrate this speech interface with a leash while using a maneuverable robot.
Acknowledgment
This work was supported by the Hopper Dean Foundation and National Science Foundation Award #1757690. Transfer-to-Excellence program is sponsored by the National Science Foundation and the Center for Energy Efficient Electronics (NSF #0939514). I would like to thank my mentor, Zhongyu Li, for his guidance and support throughout this experience. I also want to thank my Principal Investigator, Koushil Sreenath, for giving me the opportunity to be apart of his research group. I would also like to thank Nicole McIntyre, Tony Vo Hoang, Sam Mountain, Gary Yang, and the Hybrid Robotics Group for their constant support.
References
- [1] J. Lloyd, C. Budge, S. La Grow, and K. Stafford, ”An investigation of the complexities of successful and unsuccessful guide dog matching and partnerships,” Frontiers in Veterinary Science , vol. 3, 2016.
- [2] Z. Li and R. Hollis, ”Toward A Ballbot for Physically Leading People: A Human-Centered Approach,” 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019.
- [3] A. Wachaja, P. Agarwal, M. Zink, M. R. Adame, K. Möller, and W. Burgard, ”Navigating blind people with walking impairments using a smart walker,” Autonomous Robots, vol. 41, no. 3, pp. 555–573, 2016.
- [4] D. R. Bruno, M. H. de Assis, and F. S. Osorio, ”Development of a Mobile Robot: Robotic Guide Dog for Aid of Visual Disabilities in Urban Environments,” 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE), 2019.
- [5] A. Xiao, W. Tong, L. Yang, J. Zeng, Z. Li, and K. Sreenath, ”Robotic Guide Dog: Leading a Human with Leash-Guided Hybrid Physical Interaction,” arXiv preprint, arXiv:2103.14300, 2021.
- [6] ”Unitree Robotics A1,” Unitree Robotics. [Online]. Available: https://www.unitree.com/products/a1/. [Accessed: 02-Aug-2021].
- [7] AWS Robotics, ”aws-robotics/tts-ros1,” GitHub. [Online]. Available: https://github.com/aws-robotics/tts-ros1. [Accessed: 02-Aug-2021].
- [8] X. Tan, ”SUCCESS-MURI/success_google_stt,” GitHub. [Online]. Available: https://github.com/SUCCESS-MURI/success_google_stt. [Accessed: 02-Aug-2021].