Agents that need to act on their surroundings can significantly benefit from the perception of their interaction possibilities or affordances. The concept of affordance is innately interlinked and founded for egocentric perception, and the term coined by James J. Gibson  within the field of ecological perception. For Gibson, affordances are action opportunities in the environment that are directly perceived by the observer. According to this, the goal of vision is to recognise the affordances rather than the elements or objects in the scene. The concept of affordances calls for an approach to visual perception that is free from non-action representations, and that is there to help the agent to interact with the world. Following Gibson’s call for a direct perception of affordances, and motivated by studies in neuroscience showing that affordance detection does not require semantic reasoning ; we hypothesise that geometric information or shape provides enough information for an agent to directly perceive the interaction opportunities in its surroundings. Examples of our geometry-driven affordance detection approach are shown in Fig. 1. As detailed later on, the detection is agnostic to semantics and complex representations of the input scene.
Related work Much of the attention given to the problem of affordances has focused on the classification of object instances in the world, internal symbolic relationships  or semantic category information, which strongly undermines the idea of direct and economical perception of affordances proposed by Gibson. But to directly determine affordances has faced many dilemmas, namely the challenging problems of visually recovering the relevant properties of the environment in a robust and accurate manner.
We argue that in order to truly perceive affordances in a way that is most useful for agents, there is a need for methods that are agnostic to object categories and free from complex feature representations; methods of a generic nature that allow for the simple yet robust description of multiple affordances. We hypothesise that geometry on its own provides enough information to robustly and generically characterise affordances.
2 Our approach
We concentrate on the subclass of affordances between rigid objects. Affordances such as “where can I hang this?”, place this, ride, fill, and similar. We do this by specifying a geometry-driven interaction tensor that aims to capture the way in which the affordance manifests between a pair of objects. In contrast with previous approaches, our algorithm is able to generalise from a single training example to completely novel environments, i.e. one-shot learning. Here we describe the core of our approach, namely the affordance representation (Interaction Tensor) and the algorithm that allows for fast one-shot detections.
The Interaction Tensor The Interaction Tensor () 
is a vector field representation able to characterise the static interactions between 2 generic entities (e.g. objects) in 3D space. This proposed representation builds on the Interaction Bisector Surface (IBS) concept and extends its robustness by three main factors:
Proposing a representation suitable for visually generated data, e.g. pointclouds
Increases robustness by encoding the locations in the interacting entities that contributed to the computation of their bisector —provenance vectors
Introduces a straight forward descriptor that allows for real-time prediction of affordance candidate locations on RGB-D data —affordance keypoints
Using direct, sparse sampling over the allows for the determination of geometrically similar interactions from a single training example; this sampling comprises what we call affordance keypoints, which serve to more quickly judge the likelihood of an affordance at a test point in a scene. The
is straightforward to compute and tolerates well changes in geometry, which provides a good generalisation to unseen scenes from a single example. The key steps include an example affordance from a simulated interaction, the computation of the IBS between an object (query-object) and scene (or scene-object), and estimating provenance vectors which are the vectors used in the computation of points on the bisector surface. Top row (Training example) in Fig.2 shows the elements and the process involved in computing an affordance for Placing a bowl.
Fast one-shot affordance detection
In order to make fast affordance detection in a novel scenario without computing the full descriptor, we perform an approximation of the descriptor via a Nearest Neighbour (NN) search. This can be done by taking advantage of the provenance vectors from the training example; these vectors account for regions in the scene that contributed to the computation of the bisector surface. The proposed algorithm uses this information to investigate whether those regions exist in a novel scenario, these regions would allow computing the same or a similar . In this sense, the NN-search is used to investigate if the point in the scene required to compute a point on the exists; or more precisely, if the point in the scene is where is expected to be. The detection pipeline is illustrated in the bottom diagram (Detection) of Fig. 2. Full description of our methods is available at  and .
3 For Robots and AR
We leverage a state-of-the-art and publicly available dense mapping system  paired with an RGB-D sensor to recover a 3D representation of the scene in front of the camera. Our implementation of the affordance detection algorithm leverages the parallelisation capabilities of commodity desktop hardware. As a reference, our algorithm running in a PC with a Titan X GPU allows for the simultaneous detection of up to 84 affordances at 10 point locations of the input scene in under 1 second. Fig. 3 summarises the computation times involved in the current implementation.
Our experiments include multiple affordances of everyday objects such as cups, mugs, bowls, etc. and also detections of human affordances such as Sitting or Riding. Here we show qualitative results of the algorithm of its application in robotics systems (Fig. 4) as well as for augmented reality (Fig. 3). The scenes used for our experiments include publicly available data such as , amongst others of our own. Code and data of our core affordance detection approach have been made publicly available111https://github.com/eduard626/interaction-tensor.
We have developed a tensor field representation to characterise the interactions between pairs of objects. This representation and the proposed algorithm allow for real-time and multiple affordance detections in novel environments training from a single example. In this abstract we showed results of the application of the proposed approach for robotic perception and scene augmentation in mixed reality systems. Overall, we see this work as an effort to motivate further advancing of approaches in Vision that are more ecological in nature and consider the relationship between the scene and the perceiving agent.
-  C. Chuang, J. Li, A. Torralba, and S. Fidler. Learning to act properly: Predicting and explaining affordances from images. In , pages 975–983, June 2018.
-  Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2017.
T. Do, A. Nguyen, and I. Reid.
Affordancenet: An end-to-end deep learning approach for object affordance detection.In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1–5, May 2018.
-  Ruiz Eduardo and Mayol-Cuevas Walterio. Where can i do this? geometric affordances from a single example with the interaction tensor. In Robotics and Automation (ICRA), 2018 IEEE International Conference on, May 2018.
-  James J. Gibson. The Ecological Approach to Visual Perception. Houghton Mifflin, 1979.
-  Richard A Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J Davison, Pushmeet Kohi, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. Kinectfusion: Real-time dense surface mapping and tracking. In Mixed and augmented reality (ISMAR), 2011 10th IEEE international symposium on, pages 127–136. IEEE, 2011.
-  Eduardo Ruiz and Walterio W. Mayol-Cuevas. What can I do here? leveraging deep 3d saliency and geometry for fast and scalable multiple affordance detection. CoRR, abs/1812.00889, 2018.
-  G. Vingerhoets, K. Vandamme, and A. Vercammen. Conceptual and physical object qualities contribute differently to motor affordances. Brain and Cognition, 69(3):481 – 489, 2009.
-  Xi Zhao, He Wang, and Taku Komura. Indexing 3d scenes using the interaction bisector surface. ACM Trans. Graph., 33:1–14, 2014.
-  Yuke Zhu, Alireza Fathi, and Li Fei-Fei. Reasoning about Object Affordances in a Knowledge Base Representation, volume 8690 of Lecture Notes in Computer Science, book section 27, pages 408–424. Springer International Publishing, 2014.