Augmenting common items with digitized content to extend their functionalities has been the focus of past research in the domain of tangible user interfaces . Thereby, objects are tracked by a system that displays visual cues or extends the functionality of the object itself . By rotating, repositioning, or placing objects in defined positions, user-defined actions can be triggered. Thus, common items are augmented by functionalities which they do not implement by themselves.
Two modalities to display such augmented content have emerged. Smart glasses, such as the Microsoft HoloLens111www.microsoft.com/en-us/hololens - last access 2019-05-17, enable mobile use of augmented reality to display additional supporting content . Furthermore, in-situ projection systems enable the augmentation of stationary workstations that can be used for practical exercises (see Figure Enabling Tangible Interaction through Detection and Augmentation of Everyday Objects). While smart glasses are preferred in a mobile context, in-situ projections are suitable for stationary settings. While mobile augmentation was preferred during practical physics exercises that required mobility of their students , industrial use cases  and social housing organizations [14, 15] found stationary settings more suitable. Furthermore, employing object augmentation provides cognitive alleviation, which has the potential to boost overall user performance and productivity [11, 12].
Both modalities use camera-based systems to recognize objects and enrich them with additional content. However, seamless object detection and augmentation poses challenges for different use cases. In this work, we present object detection strategies we employed in past research projects to enable object detection and augmentation. We discuss the advantages and disadvantages of different object tracking modalities. Finally, we present how user-defined tangibles from everyday items can be created by augmenting them with in-situ projections. We conclude with challenges that have to be considered when integrating ubiquitous object augmentation.
Object detection using SURF. The positioned object is compared to a reference image. Feature extraction, such as provided by the SURF algorithm, shows the similarity of the image.(a): Correct positioned image. (b): A rotated object does not guarantee that it will be detected relative to the reference image.
2 Object Tracking
To enable interaction with common items, suitable tracking systems and algorithms need to be employed. In the following, we present three object tracking strategies we have employed in past research.
The Speeded Up Robust Feature (SURF) algorithm  enables to recognize points and areas of "interest" in images. Due to its efficient implementation, it enables the processing of images in real-time. Thereby, the algorithm has been used for object detection by comparing points of interest in a captured image relative to a reference image . SURF can be employed with inexpensive hardware since it processes color images. However, SURF is not rotation and perspective invariant. This requires objects to be in a similar position that is expected by a system (see Figure 1).
2.2 Depth Sensing
A depth sensor, such as the Intel Realsense222www.intel.com/content/www/us/en/architecture-and-technology/realsense-overview.html - last access 2019-05-17 or the Kinect v2333https://developer.microsoft.com/en-us/windows/kinect - last access 2019-05-17, provide a 3D representation of objects that they are pointed to. Objects are recognized by analyzing the shape. Thereby, two relevant methods have emerged. The first method uses a projected infrared pattern on a surface (see Figure 2.2). Afterward, the depth sensor measures changes in the perspective of the pattern. This enables to detect the distance between infrared waves and allows a reconstruction of the 3D space on a surface .
The second method uses a Time-of-Flight approach. Thereby, the round trip time of an artificial light (i.e., infrared light) is measured between the sensor and a point on the surface. When the reflection of the light is captured, a 3D representation on of the surface is created .
Depth sensing is insensitive to lighting conditions. However, changes in perspective and rotation of objects may affect the overall detection quality. Thus, depth sensing is suitable for use cases where objects reside in stable positions.
2.3 You Only Look Once
The algorithm "You Only Look Once" (YOLO) is a deep learning approach to detect objects regardless of their perspective and position (see Figure 2
). It applies a single neural network on an image that detects features in bounding boxes after clustering their properties. By evaluating those properties, a probability of a correctly detected object is calculated. While YOLO represents a robust real-time method to detect objects regardless of their positioning and perspective, it requires an extensive training set beforehand. Furthermore, training a neural network on a large data set requires time and, depending on the use case, fast computational hardware to speed up the training process.
3 Object Augmentation
Objects can be used as a visual cue for interaction or interaction device itself. In the following, we show implementations of tangible object augmentation we have conducted in the past.
3.1 Ambient Augmentation
After recognizing the type of object, cues can be used to implicitly guide the user through a series of actions. Figure 3 shows an augmented workspace that uses in-situ projection as a guide through a series of assembly steps. By detecting the user’s action and items on the workspace, in-situ projections are placed on the current relevant bin or final spot for assembly. While boosting the overall performance of workers in industrial environments , people with dementia and loss in memory benefit from in-situ projections .
3.2 User-defined Tangibles
Regular objects can be registered as user-defined tangible that is made available for interaction . For example, rotating (see Figure (a)a) or positioning (see Figure (b)b) objects can be used to change the speaker volume.
After registering the object, a series of options are made available to the user. The user can choose to interact with existing objects or register new objects. Such objects can be everyday items which do not implement a logic. This transforms objects into user-defined tangibles that are already around the user with just-in-time interaction.
4 Challenges and Future Work
Seamless object detection and augmentation in home and workplace settings are prone to certain challenges. In this work, we presented three strategies to detect objects and augment objects. However, choosing the right detection modality depends on the environment as well as on the properties of the object itself. For example, a depth sensor will struggle to detect flat objects as they have scarce 3D properties. While a regular color camera can solve this problem, it is sensitive to the overall environmental illumination. In future work, we want to combine the definition and detection of user-defined tangibles by using an approach that combines color as well as depth images . Thereby, a combination of depth and color data provides an approximation of object type.
Furthermore, privacy and ethical considerations have to be taken into account. By using the presented camera-based approach, public and private spaces are recorded during user interaction. While users can give consent to process the collected data in private settings, public spaces and workplaces are more sensitive to privacy-related issues. In future work, we want to investigate those ethical ramifications. Ultimately, we will investigate design guidelines that explore how a camera-based approach can be conducted while minimally invading the user’s privacy.
In this work, we present three strategies to detect objects which we have employed in past research projects. We outline the advantages and disadvantages of each strategy which we have encountered. We show how object detection and user-defined tangibles can be implemented to provide ambient or explicit interaction. Finally, we discuss challenges that have to be tackled before enabling seamless object tracking in home and work settings. Since common objects do not implement any logic, we believe that external object augmentation paves the way for ubiquitous tangible interaction at home, public spaces, and workplaces.
-  (2012) Kinect depth sensor evaluation for computer vision applications. Aarhus University, pp. 1–37. Cited by: §2.2, §2.2.
-  (2006) SURF: speeded up robust features. In Computer Vision – ECCV 2006, A. Leonardis, H. Bischof, and A. Pinz (Eds.), Berlin, Heidelberg, pp. 404–417. External Links: Cited by: §2.1.
-  (2010-05) Fast and efficient fpga-based feature detection employing the surf algorithm. In 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, Vol. , pp. 3–10. External Links: Cited by: §2.1.
-  (2005) Virtual and augmented reality support for discrete manufacturing system simulation. Computers in Industry 56 (4), pp. 371 – 383. Note: The Digital Factory: An Instrument of the Present and the Future External Links: Cited by: §1.
-  (2017) Working with augmented reality? a long-term analysis of in-situ instructions at the assembly workplace. In Proceedings of the 10th ACM International Conference on PErvasive Technologies Related to Assistive Environments, New York, NY, USA. External Links: Cited by: §1.
-  (2014) An augmented workplace for enabling user-defined tangibles. In CHI ’14 Extended Abstracts on Human Factors in Computing Systems, CHI EA’14, New York, NY, USA, pp. 1285–1290. External Links: Cited by: Figure 4, §3.2.
-  (2016) Interactive worker assistance: comparing the effects of in-situ projection, head-mounted displays, tablet, and paper instructions. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 934–939. External Links: Cited by: §3.1.
A time-of-flight depth sensor - system description, issues and solutions.
2004 Conference on Computer Vision and Pattern Recognition Workshop, Vol. , pp. 35–35. External Links: Cited by: §2.2.
-  (2006) Getting a grip on tangible interaction: a framework on physical space and social interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’06, New York, NY, USA, pp. 437–446. External Links: Cited by: §1.
-  (2007) ReacTIVision: a computer-vision framework for table-based tangible interaction. In Proceedings of the 1st International Conference on Tangible and Embedded Interaction, TEI ’07, New York, NY, USA, pp. 69–74. External Links: Cited by: §1.
-  (2017) One size does not fit all - challenges of providing interactive worker assistance in industrial settings. Proceedings of the 2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing. External Links: Cited by: §1.
-  (2018) Identifying cognitive assistance with mobile electroencephalography: a case study with in-situ projections for manual assembly. In Proceedings of the 10th ACM SIGCHI symposium on Engineering interactive computing systems, External Links: Cited by: §1.
-  (2016) Comparing tactile, auditory, and visual assembly error-feedback for workers with cognitive impairments. In Proceedings of the 18th international ACM SIGACCESS conference on Computers & accessibility, pp. . External Links: Cited by: §3.1.
-  (2019) The digital cooking coach: using visual and auditory in-situ instructions to assist cognitively impaired during cooking. In Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments, New York, NY, USA. External Links: Cited by: §1.
-  (2018) Smart kitchens for people with cognitive impairments: a qualitative study of design requirements. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI ’18, New York, NY, USA. External Links: Cited by: §1.
-  (2011-05) Sparse distance learning for object recognition combining rgb and depth information. In 2011 IEEE International Conference on Robotics and Automation, Vol. , pp. 4007–4013. External Links: Cited by: §4.
-  (2016-06) You only look once: unified, real-time object detection. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.3.
-  (2018-03) Physics holo.lab learning experience: using smartglasses for augmented reality labwork to foster the concepts of heat conduction. European Journal of Physics 39 (3), pp. 035703. External Links: Cited by: §1.