Robots and autonomous systems are nowadays utilized in many areas of industrial production and, lately, are being more and more present in the everyday life. For instance, social robots [1, 2] are used to welcome and guide people at the entrance of companies, in museums and showrooms, and so on. From health-care perspective, substantial research has been carried out to develop robots for elderly people in-home assistance , Furthermore, in hospitals, surgical operations are usually performed with the support of small robots controlled by doctors. In the context of autonomous and intelligent transportation systems, driver-less vehicles (cars or flying vehicles) are becoming more popular .
Recent applications of robotics concern automation in agriculture and gardening. ‘Green-thumb’ robots are used for automatic planting or harvesting and contribute to increase the productivity level of farming and cultivation infrastructures [5, 6]
In this paper, we present an overview of the EU H2020 funded project named TrimBot2020111http://www.trimbot2020.org, whose aim is to investigate the underlying robotics and vision technologies to prototype the next generation of intelligent gardening consumer robots. In Figure 1, we show a picture of the TrimBot2020 prototype robot, for which we give more details in the rest of the paper.
2 Challenges in gardening robotics
The peculiar characteristics of gardens, i.e. the highly textured outdoor environment with a large presence of green color and the irregularity of objects and terrain, create large challenges for autonomous systems and for computer vision algorithms.
Gardens are dynamic environments, as they change over time because of seasonal changes and natural growth of plants and flowers. Variable lighting conditions, depending on the time of the day and varying weather conditions also influence the color appearance of objects and the functioning of systems based on cameras and computer vision algorithms . The robot itself causes changes in appearance and geometry by cutting hedges, bushes, etc. This also brings significant challenges, especially for building and maintaining a map of the garden for visual navigation.
Robots for gardening applications are required to navigate on varying, irregular terrain, like grass or pavement, and avoid non-drivable areas, such as pebble stones or woodchips. Navigation strategies also have to take into account the presence of slopes and plan the robot movements accordingly, in order to reach the target objects effectively.
Garden objects, such as topiary and rose bushes, usually have irregular shapes and are difficult to model. Robust and effective representations of plant shapes are, however, needed to facilitate robot operations. For instance, challenges arise to represent the correct shape and size of a topiary bush and subsequently deciding where and how much cutting is needed for an overgrown bush. These problems concern also the matching of the shape of observed objects to ideal target shapes, taking into account expert knowledge on plant cutting and geometric constraints.
Further challenges concern the servoing of cutting tools towards the target objects. These objects are subject to bending, flexing and movements generated by the forces and pressure introduced by cutting tools. Weather issues, like wind, also determine movements and deformations of the the target objects. Target bushes and flowers have to be modeled dynamically and over time.
3 The TrimBot2020 project
The TrimBot2020 project aims at developing the algorithmic concepts that allow a robot to navigate over varying terrains, avoid obstacles, approach hedges, topiary bushes and roses, and trim them to ideal shapes. The project includes the development and integration of robotics, mechatronic and computer vision technologies.
3.1 Platform and camera setup
The TrimBot2020 robotic platform is based on a modified version of the commercial Bosch Indigo lawn mower on which a Kinova robotic arm is mounted. The platform is provided with stabilizers, used during the bush cutting phase to make the robot steady on the ground. This is necessary to avoid oscillations of the robot chassis and ensure precision of movement of the robotic arm. In Figure 1, we show a picture of the prototype platform with stabilizers in the Wageningen garden, on top of which a Kinova robotic arm and a bush cutting tool are mounted.
The robot platform is equipped with a pentagon-shaped rig of five pairs of stereo cameras, of which we show a top-view in Figure 2. The cameras are arranged in such a way that a view of the surrounding environment is obtained. Each stereo pair is composed of one RGB camera and one grayscale camera, which acquire images at a
pixel resolution (WVGA). Each camera features an image sensor and an inertial measurement unit (IMU). RGB images are required for semantic scene understanding as color is an important cue. However, the color cameras are less light-sensitive, which can have a detrimental effect if the sun is directly shining into the cameras. As such, we use a grayscale camera for the second camera in each stereo pair, which is dominantly used for visual navigation. In Figure3, we show images acquired by the ten cameras in the pentagon rig inside the Wageningen test garden. In the first row, the color images from the right cameras in the pairs are shown, while in the second row the corresponding left grayscale images are depicted. The acquisition from the ten cameras is synchronized by means of an FPGA, which provides efficient on board computation of rectified images and stereo depth maps at 10 FPS. For further details on the image acquisition system, we refer the reader to .
3.2 Arm and trimming control
The moving platform is equipped with a 6DOF robotic arm, which is used for the operations of bush trimming and rose cutting. Custom designed end-effectors for omnidirectional trimming and rose cutting are mounted on the robotic arm. In Figure 4, we show the prototype end-effectors built by the TrimBot2020 project consortium.
Once the robot has navigated towards a bush or hedge, a 3D reconstructed model of the bush or hedge is computed and used as input for the trimming operation. The model is fitted to a polygonal mesh, which is used to determine the amount of surface to be trimmed. An approximation to the traveling salesman problem is adopted to minimize the path to be followed by the robotic arm in order to trim the bush to the desired shape. The joint use of an omni-directional end-effector for trimming and a polygonal mesh allow for complexity reduction of the path planning problem. A demo video of the robotic arm operating a bush trimming is available on the TrimBot2020 website222The video is available at http://trimbot2020.webhosting.rug.nl/automatic-cutting-at-work-video/.
3.3 3D data processing and dynamic recontruction
While navigating the garden, the robot uses a Simultaneous Localization and Mapping (SLAM) system, which is responsible for simultaneously estimating a 3D map of the garden (in the form of a sparse point cloud) and the position of the robot with respect to the resulting 3D map. The SLAM system is based on local feature extraction from the images acquired by all the ten cameras in the pentagon rig, which are modeled as a single generalized camera. An example of a reconstructed 3D point cloud of the Wageningen test garden is depicted in Figure 5. Recent developments of the visual localization module concerned the joint use of geometric and semantic information . The method is based on learning local descriptors based on a generative model for semantic scene completion, allowing the method to establish correspondences even under strong viewpoint changes or under seasonal changes.
For scene understanding and servoing of the robotic arm to the target bushes and roses, TrimBot2020 has developed precise algorithms for disparity computation from monocular images (DeMoN) 
and from stereo images, based on convolutional neural networks (DispNet), 3D plane labeling  and trinocular matching with baseline recovery . An algorithm for optical flow estimation was also developed , that is based on a multi-stage CNN approach with itarative refinement of its own predictions.
3.4 Scene understanding
Garden navigation and bush/hedge trimming require reliable identification and categorization of different objects in the scene. For instance, analysis of color images gives information about the type of of objects (e.g. bushes, roses, trees, hedges, etc.) present in the scene. A drivable area (e.g. grass or pavement) can be distinguished from a non-drivable one (e.g. gravel, pebble stones or woodchips are not drivable surfaces for the TrimBot2020). Furthermore, varying weather and illlumination conditions determine changes in the color appearance of objects in images. The TrimBot2020 computer vision system employs a method for intrinsic image deconposition into reflectance and shading components. The reflectance is the color of the object that is invariant to illumination condition and viewpoint, while the shading consists of shadows and reflections that are dependent on the geometry of the object and the camera viewpoint. TrimBot2020 employs a novel convolutional network architecture to decompose the color images into intrinsic components . The proposed CNN is trained taking into account the physical model of the image formation process.
An algorithm for semantic segmentation of images is employed to identify and segment the different objects in the scene. In Figure 6, we show an example of the semantic segmentation output obtained for an image of the Wageningen garden. The segmentation is provided by a convolutional neural network trained on a large data set of synthetic garden images. Grass, topiary bushes, trees and fences are automatically identified and segmented from the original scene.
3.5 Test gardens
In order to test and evaluate the developed technologies in real environments, the TrimBot2020 project has built two test gardens, one at Wageningen University and Research, Netherlands, and the other at the Bosch Campus in Renningen, Germany.
The test garden in Wageningen is approximately meters in size and contains various garden objects, such as boxwoods, hedges, rose bushes, trees and different terrains, e.g. grass, woodchips and pebble stone. The garden contains a slope, on top of which four topiary bushes are placed. The garden is double fenced for safety constraints. A view of the test garden in Wageningen is shown in Figure 7. The test garden at the Bosch Campus in Renningen has size meters, approximately.
4 Public data sets and challenges
The TrimBot2020 consortium recently published a data set333The data set is available at the url https://gitlab.inf.ed.ac.uk/3DRMS/Challenge2017 to test semantic segmentation and reconstruction algorithms in garden scenes, as part of a challenge held at the 3D reconstruction meets semantics (3DRMS) workshop . The data set contains training and test sequences, composed of calibrated images with the corresponding ground truth semantic labels and a semantically annotated 3D point cloud depicting the areas of the garden that correspond to the sequences. For each sequence, the images taken by four cameras (two color and two grayscale cameras) of the stereo rig are present. In the left column of Figure 8, we depict two example images from the 3DRMS data set, while in the right column we show the corresponding semantic ground truth images. In Table 1, we report details of the composition of the data set.
The data set was reseased as part of a semantic 3D reconstruction challenge in the 3DRMS workshop. Two submissions to the challenge were received from authors external to the TrimBot2020 consortium. The reconstruction performance results were evaluated by computing the reconstruction accuracy and completeness for a set of distance thresholds [19, 20] and the semantic quality of the triangles that are correctly labeled. The baseline results for 3D reconstruction were obtained with COLMAP , while SegNet  was used as the semantic segmentation baseline. In Table 2, we report the results achieved by the methods submitted to the 3DRMS challenge. For further details on the evaluation and analysis of the challenge outcome, we refer the reader to .
The novelty of the TrimBot2020 gardening robot development constantly brings challenges both in computer vision and in path planning and arm control problems. Combining semantic and intrinsic image information, with 3D reconstructed structures to improve the SLAM system is one of the objectives of the project. Path planning and visual servoing of the robotic arm are also innovative solutions that the TrimBot2020 project is aiming at delivering and prototyping.
This project received funding from the European Union’s Horizon 2020 research and innovation program under grant No. 688007 (TrimBot2020).
|3DRMS challenge results|
-  K. Charalampous, I. Kostavelis, and A. Gasteratos, “Recent trends in social aware robot navigation: A survey,” Robotics and Autonomous Systems, vol. 93, pp. 85 – 104, 2017.
-  A. Tapus, A. Bandera, R. Vazquez-Martin, and L. V. Calderita, “Perceiving the person and their interactions with the others for social robotics – a review,” Pattern Recognition Letters, 2018.
-  K. M. Goher, N. Mansouri, and S. O. Fadlallah, “Assessment of personal care and medical robots from older adults’ perspective,” Robotics and Biomimetics, vol. 4, no. 1, p. 5, Sep 2017.
-  J. Janai, F. Güney, A. Behl, and A. Geiger, “Computer vision for autonomous vehicles: Problems, datasets and state-of-the-art,” CoRR, vol. abs/1704.05519, 2017. [Online]. Available: http://arxiv.org/abs/1704.05519
-  C. W. Bac, E. J. Henten, J. Hemming, and Y. Edan, “Harvesting robots for highvalue crops: Stateoftheart review and challenges ahead,” Journal of Field Robotics, vol. 31, no. 6, pp. 888–911, 2014.
-  C. W. Bac, J. Hemming, B. Tuijl, R. Barth, E. Wais, and E. J. Henten, “Performance evaluation of a harvesting robot for sweet pepper,” Journal of Field Robotics, vol. 34, no. 6, pp. 1123–1139.
-  A. Gijsenij, T. Gevers, and J. van de Weijer, “Computational color constancy: Survey and experiments,” IEEE Transactions on Image Processing, vol. 20, no. 9, pp. 2475–2489, Sept 2011.
-  D. Honegger, T. Sattler, and M. Pollefeys, “Embedded real-time multi-baseline stereo,” in IEEE ICRA, 2017.
-  R. Pless, “Using many cameras as one,” in 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., vol. 2, June 2003, pp. II–587–93 vol.2.
-  J. L. Schönberger, M. Pollefeys, A. Geiger, and T. Sattler, “Semantic Visual Localization,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
-  B. Ummenhofer, H. Zhou, J. Uhrig, N. Mayer, E. Ilg, A. Dosovitskiy, and T. Brox, “Demon: Depth and motion network for learning monocular stereo,” in Computer Vision and Pattern Recognition, IEEE International Conference on, 2017.
-  N. Mayer, E. Ilg, P. Häusser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox, “A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation,” in Computer Vision and Pattern Recognition, IEEE International Conference on, June 2016, pp. 4040–4048.
-  L. A. Horna and R. B. Fisher, “3d plane labeling stereo matching with content aware adaptive windows,” in 12th Int. Joint Conf. on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2017.
-  L. Horna and R. B. Fisher, “Plane labeling trinocular stereo matching with baseline recovery,” in The Fifteenth IAPR International Conference on Machine Vision Applications, 2017.
-  E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” in Computer Vision and Pattern Recognition, IEEE International Conference on, July 2017, pp. 1647–1655.
-  N. Mayer, E. Ilg, P. Fischer, C. Hazirbas, D. Cremers, A. Dosovitskiy, and T. Brox, “What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?” International Journal of Computer Vision, 2018, to appear.
-  A. S. Baslamisli, H.-A. Le, and T. Gevers, “CNN based Learning using Reflection and Retinex Models for Intrinsic Image Decomposition,” ArXiv e-prints, Dec. 2017.
-  T. Sattler, R. Tylecek, T. Brox, M. Pollefeys, and R. B. Fisher, “3d reconstruction meets semantics – reconstruction challenge 2017,” ICCV Workshop, Venice, Italy, Tech. Rep., 2017. [Online]. Available: http://trimbot2020.webhosting.rug.nl/wp-content/uploads/2017/11/rms_challenge.pdf
-  S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A comparison and evaluation of multi-view stereo reconstruction algorithms,” in Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, ser. CVPR ’06, 2006, pp. 519–528.
-  T. Schöps, J. L. Schönberger, S. Galliani, T. Sattler, K. Schindler, M. Pollefeys, and A. Geiger, “A multi-view stereo benchmark with high-resolution images and multi-camera videos,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2017, 2017.
-  J. L. Schönberger, E. Zheng, M. Pollefeys, and J.-M. Frahm, “Pixelwise view selection for unstructured multi-view stereo,” in European Conference on Computer Vision (ECCV), 2016.
-  V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, Dec 2017.
-  Y. Taguchi and C. Feng, “Semantic 3d reconstruction using depth and label fusion,” in 3DRMS Workshop Challenge, ICCV, 2017.
-  J. Guerry, A. Boulch, B. L. Saux, J. Moras, A. Plyer, and D. Filliat, “Snapnet-r: Consistent 3d multi-view semantic labeling for robotics,” in 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Oct 2017, pp. 669–678.