Through the Looking Glass: Diminishing Occlusions in Robot Vision Systems with Mirror Reflections

by   Kentaro Yoshioka, et al.

The quality of robot vision greatly affects the performance of automation systems, where occlusions stand as one of the biggest challenges. If the target is occluded from the sensor, detecting and grasping such objects become very challenging. For example, when multiple robot arms cooperate in a single workplace, occlusions will be created under the robot arm itself and hide objects underneath. While occlusions can be greatly reduced by installing multiple sensors, the increase in sensor costs cannot be ignored. Moreover, the sensor placements must be rearranged every time the robot operation routine and layout change. To diminish occlusions, we propose the first robot vision system with tilt-type mirror reflection sensing. By instantly tilting the sensor itself, we obtain two sensing results with different views: conventional direct line-of-sight sensing and non-line-of-sight sensing via mirror reflections. Our proposed system removes occlusions adaptively by detecting the occlusions in the scene and dynamically configuring the sensor tilt angle to sense the detected occluded area. Thus, sensor rearrangements are not required even after changes in robot operation or layout. Since the required hardware is the tilt-unit and a commercially available mirror, the cost increase is marginal. Through experiments, we show that our system can achieve a similar detection accuracy as systems with multiple sensors, regardless of the single-sensor implementation.



There are no comments yet.


page 1

page 5

page 6


Hybrid Multi-camera Visual Servoing to Moving Target

Visual servoing is a well-known task in robotics. However, there are sti...

Volumetric Data Fusion of External Depth and Onboard Proximity Data For Occluded Space Reduction

In this work, we present a method for a probabilistic fusion of external...

What we see and What we don't see: Imputing Occluded Crowd Structures from Robot Sensing

We consider the navigation of mobile robots in crowded environments, for...

Hand2Face: Automatic Synthesis and Recognition of Hand Over Face Occlusions

A person's face discloses important information about their affective st...

Sensing Volume Coverage of Robot Workspace using On-Robot Time-of-Flight Sensor Arrays for Safe Human Robot Interaction

In this paper, an analysis of the sensing volume coverage of robot works...

Best Viewpoint Tracking for Camera Mounted on Robotic Arm with Dynamic Obstacles

The problem of finding a next best viewpoint for 3D modeling or scene ma...

Line follower robot for industrial manufacturing process

Line follower robot is one kind of autonomous robot which follows a line...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Robot manipulation systems (e.g. bin-picking, depalletization, loading/unloading) are rapidly gaining attention in manufacturing sites and logistic industries due to the high potential upon revolutionizing productivity and labor efficiency. The robot vision of such systems often perceives the workplace, localizes the target object and detects its pose, where such information is utilized to decide the robot operation policy. Hence, when objects become occluded, the robot may overlook or estimate wrong postures. If the robot fails to grasp, not only it will fail to complete the task but may also damage the object by dropping or having collisions. Therefore, occlusions are one of the biggest issues in robot vision systems.

Fig. 1: Our proposed concept of a robot vision system with tilt-type mirror reflection is shown. While the conventional direct sensing suffers from the occlusion caused by the robot arm, by concatenating with the reflection sensing results, we perceive all objects in the workplace with a single sensor.

Robot vision systems to date have attempted to reduce occlusions as much as possible. As a rule-of-a-thumb, occlusions in the scene can be minimized by mounting depth sensors on the zenith of the robot workplace, which many commercial robot systems follow [19][22][25]. However, even with careful arrangements, an occlusion is always created directly below the robot arm, hiding objects underneath. In our paper we define sensor blind spots as occlusions, which may be created by the robot itself or by another working robot. One of the future goals in robot automation is a configuration in which multiple robots cooperate to perform one or more tasks simultaneously, which can significantly boost the work speed [24][4]. The impact of occlusions is further critical in such cases and is essential to construct a robot vision system enabling a further reduction of occlusions.

The most effective way to reduce the occlusion is to simply increase the number of sensors and sense from multiple angles. However, since high-precision depth sensors (active projection cameras [9], LiDARs [26][29]) are far more expensive than 2D cameras, the number of depth sensors should be minimized. Also, since the optimal sensor placement will depend on the configuration of the cooperative robots, it is necessary to change the sensor placement every time the robot work content changes which lack flexibility.

In this paper, we propose a robot vision system which diminishes occlusions with tilt-type mirror reflection sensing. By combining both the conventional direct sensing result from the zenith sensor and the non-line-of-sight (NLoS) sensing result with mirror reflections, we acquire sensing results with significantly reduced occlusions. The two sensing results are obtained by rapidly configuring the sensor angle with a tilt-unit. Our system detects the occlusions within the workplace and the mirror reflection scan angle is adaptively set to effectively diminish the occlusions. The only required additional hardware is the tilt-unit and a commercially available mirror: compared to adding a depth sensor, the system cost can be greatly reduced.

Our original contributions are summarized bellow:

  • To the best of our knowledge, this is the first robotic vision system that can adaptively remove occlusions utilizing mirror reflection and sensor tilting.

  • We propose a signal processing pipeline which achieves the concatenation of direct and mirror reflection sensing data with tilt-type sensing.

  • We propose a mirror displacement calibration that can be performed fully automatically without human interaction or oracles.

  • To evaluate the effectiveness of reflection sensing, we create a custom dataset which envisions multi-arm package picking. Compared to sensing only with a single zenith sensor, we confirm that reflection sensing improves the detection accuracy by 24% and the accuracy is almost a par with the results obtained by using two sensors.

Ii Related Researches

Robot vision systems tackling occlusions: There are two major approaches upon reducing occlusions in robot vision systems, where one approach is to install multiple sensors and sense from multiple angles [18][31][1]. Although this method do not prolong the sensing time, the sensor costs increase significantly. Another approach is by attaching the sensor to the end-effector and controlling the arm to sense the work space from multiple angles, occlusions can be greatly reduced [32][2][7]. However, e.g. ref.[32] senses from 15 predefined viewpoints per scene; performing multiple-sensing may become time-consuming and reduce work efficiency. To summarize, conventional approaches results in a tradeoff between work efficiency and sensor cost.

Our proposed tilt-type system has the following advantages: 1) Compared to end-effector configurations, which moves the robot arm to several positions to capture multiple-sensing results, tilting the sensor itself can be enabled with significantly smaller movements which improve work efficiency. 2) Since the only required additional components are the tilt-unit and an ordinary mirror, the hardware cost increase can be marginal.

Non-line-of-sight(NLoS) imaging: Our work belongs to the family of NLoS imaging, which uses reflection or diffusion to sense occluded objects[16]. Since first demonstrated by ref.[12], which proposed the concept of recovering occluded objects from time-of-flight data, NLoS have been applied to various medium such as laser light, acoustic beam[15], and mm-wave radars[23]. While it has been known that NLoS imaging can be realized via mirror reflections, to our best knowledge, we are the first to apply NLoS to adaptively reduce occlusions of the robot workspace.

Processing depth sensor data under existence of mirrors: Mirrors or glass corrupt the sensing results of depth sensors, because the light unintentionally reflects on such surfaces but still returns to the sensor. A ”virtual” point cloud is generated as if an object exists behind the mirror and cause a negative impact on SLAM and mapping tasks. There have been researches to compensate for such effects [27][28][11][30][10], where a typical approach is to utilize the mirror symmetry and match the virtual point clouds to the real point clouds.

In our problem setup, the mirror is installed at a known location and thus, detection of the reflection data is not required. Also, while the recovery of the virtual reflection data is necessary in our system, our signal processing is original because previous works do not utilize sensor tilting.

Improving robot vision system performance with mirror reflections: Our work is similar to ref.[13][17] which realize visual servoing through mirror reflections and ref.[21] [20] which uses a mirror to realize a imaging system with a virtually wide FoV. However, while prior works aim to expand the FoV with fixed camera and mirror, our goal is to adaptively reduce occlusions in robot vision systems by tilting the sensor and is clearly different.

Iii Tilt-type Robot Vision System with Mirror Reflections

Iii-a System Concept

Installing a depth sensor at the zenith of the robot workplace is a common configuration and provides fine sensing quality. However, occlusions cannot be completely eliminated with a single sensor setup and the remaining occlusions may cause a major issue. For example, if multiple robots work cooperatively in a single workplace, occlusions will continuously appear and may cause oversights. Fig.1 illustrates the occlusion caused by a robot arm: from the zenith sensor, the tomato below the arm is completely occluded and such object is impossible to detect.

We propose a robot vision system which diminishes occlusions with tilt-type mirror reflection sensing, and Fig.1 shows our main concept. We define the sensing strategy where the workplace is directly sensed from the depth sensor as ”direct sensing”, and ”reflection sensing” as to the NLoS sensing via the tilted depth sensor and the reflected light ray from the mirror. By concatenating these two sensing results, we realize a robot vision system with highly diminished occlusions, since the ”reflection sensing” provide results as if an extra sensor is installed in the mirror position. Actual sensing results are shown in the bottom of Fig.1, where the two target objects in the workplace (tomato and green pepper) are clearly visible. Our system attaches a commercial tilt-unit to the sensor, enabling the ”direct” and ”reflection sensing” to be freely configured at high speeds by tilting.

Fig. 2: The signal processing pipeline for the tilt-based mirror reflection system is shown. First, the direct sensing is performed and from its results, the occlusion area is detected. Utilizing the detection results, we set the tilt angle so that the occlusion can be efficiently reduced.
Fig. 3: The occlusion detection pipeline is shown. The occlusion area created by the robot arm is detected based on regions exceeding the height threshold.

Iii-B Signal Processing Pipeline

In actual robot automation systems, locations of occlusions vary greatly with the work content ( e.g. operation scheme, placements) and ”reflection sensing” with fixed tilt angles cannot remove occlusions sufficiently. To handle such problems, we propose a signal processing pipeline which adaptively detect and diminish occlusions: the occlusion area is detected from the ”direct sensing” results and the sensor tilt angle is adaptively set to minimize occlusions during ”reflection sensing”.

Fig.2 shows our overall signal processing pipeline. Firstly, ”direct sensing” is performed to sense the workplace, where the occlusions are detected from its results. To mitigate the occlusions, the tilt-unit is rotated to perform ”reflection sensing”. After converting the ”reflection sensing” data to world coordinates, the mirror virtual images are converted to the real image to enable the concatenation of the direct and reflection data. Finally, object detection is carried out using the concatenated data with diminished occlusions. The following sections describe the details of each block.

Detecting occlusions: We utilize a simple yet effective rule-based approach to detect the occluded area from the direct sensing data as shown in Fig.3

. A height threshold is defined as the maximum height of an object allowed in the work place with some margins, and any object surpassing the threshold is classified as an occlusion area. Then,

largest occlusion areas are determined as the occlusions and others are filtered out to suppress sensor noise effects, where is the known number of robots in the workplace. Finally, the centroid of the occlusion is derived to set the optimum sensor tilt angle.

In our system, we assume a robot manipulation task where the robot grasps the target object from above and then move to an another location. In such tasks, the robot arm should approach from a position sufficiently higher than the object itself to avoid collisions. The arm elbow and wrists, which mainly generates the occlusions, will also take a sufficiently higher position than the surrounding objects and thus, occlusions can be sufficiently detected with height thresholds. While image recognition techniques (e.g. DNNs) can be utilized to detect occlusions, its execution time is overwhelmingly long. Since the latency of occlusion detections directly prolongs the sensing time, it may lead to degrading the robot working efficiency.

Fig. 4: The derivation of with mirror reflections are shown. By converting the target coordinates to virtual coordinates , the optimal can be calculated by connecting the sensor and with a straight line.

Setting the optimal tilt angle : Fig.4 shows the method to set the tilt angle so that occlusions can be removed effectively. In the case of conventional direct sensing, the target object is most efficiently captured when the tilt angle is set to form a straight line between the sensor and the target coordinate. For simplicity, let us consider a two-dimensional space, where the sensor position is [, ] and the target coordinate is [, ]:


On the other hand, with mirror reflections three elements are required to set : the sensor coordinate, the transformation matrix which converts the virtual reflection data to real data, and the target coordinate. Let the target coordinate be , the ”virtual” mirror-imaged target coordinate . Such conversion can be realized with the inverse matrix :


Thus, as shown in Fig.4, the optimal is set when and the sensor is directly connected which can be calculated in the same manner as Eq.(1).

Fig. 5: (a) Definition of the world coordinate in our system. (b) The state of TiltedSensor coordinate with tilt.

Converting tilt sensing reflection data to world coordinates: In our system, two coordinate translations are required: (1) translating tilted sensing data to the world coordinates, (2) translating virtual mirror reflection data to real data. Firstly, the translation of the world coordinates are explained. In order to simplify the calculation, we define the world coordinate as shown in Fig.5(a). Note that the tilt rotation is around the Y-axis with the tilt-unit at the center. Fig.5(b) illustrates an example when the sensor is tilted by . Since the data obtained after tilting is in the TiltedSensor coordinate system, it must be converted to the world coordinate system for processing. Such transformation matrix can be formulated where the the rotation radius of the tilt-unit is :

Since is given from the system and is determined by the mechanical of the tilt-unit, both of the values are known beforehand.

In addition, as shown in Fig.4, the data sensed through the mirror forms a ”virtual image” and must be converted to a ”real image” by mirror-image transformation. As formulated with the householder transformation[8], when the mirror plane is represented by


and the L2 norm of a, b and c is unity, the mirror-image transformation can be expressed as:

Sensing error analysis for reflection sensing: In reflection sensing, the working distance of the sensor equivalently increases and increases the sensor measurement error. When the working distance of is 1, the distance to the object for reflection sensing () is:


While measurement errors of active 3D sensors depend on its architecture (e.g. stereo vision, dToF, iToF) and its lighting conditions, we can expect that the measurement error increase linearly or square to the working distance [9, 29, 6].

Since the light decays with mirror reflection, the mirror quality can also become an error source. When the mirror reflectance is , the light emitted from the sensor will be attenuated by compared to the direct path. Note that typical commercial mirrors have a high reflectance of 90%; the distance-dependent error is dominate in our system.

Iv Automatic Mirror Displacement Calibration

for  do
     for  do
         Obtain Direct and Reflection Sensing results
         if  then
         end if
     end for
end for
Algorithm 1 Psuedocode to obtain the calibration-optimal robot pose

As described in section III, our mirror reflection vision system rely on the transformation matrix to process mirror reflection data and to set . However, with disturbance such as factory vibrations, the position/angle of the mirror may shift, altering from the predefined value. Even with a mirror angle shift of only few degrees, significant error is introduced to the sensing results; an automatic mirror displacement calibration is necessary. Moreover, while human-guided calibration (e.g. covering the mirror by non-reflective cloth) are easy to conduct, it is not desirable since it will persistently interrupt the robot operation.

To achieve automatic calibrations, we propose a calibration method which uses the robot itself as the calibration target. is derived by registering the ”virtual” point clouds (obtained from reflection sensing) to ”real” point clouds (from direct sensing). However, to achieve high registration accuracy, it is important to configure a robot pose which have a large number of data in common between the two sensing results. Since it is challenging to perceive the actual amount of common data between the two sensing results, we follow the intuition: when the same object is captured, the number of common data should scale with the total number of the sensed data. Algorithm 1 shows the flow upon deciding the calibration robot pose, where the robot’s joints are configured to find the posture which maximizes the total number of robot arm point cloud data (). Note that is the sum of direct () and reflection sensing results (), and the robot arm point clouds are obtained by height threshold filtering explained in Fig.3. Since the ray trajectory is longer in reflection sensing, the point cloud is inherently sparse and we aid by to mitigate the sparsity effects.


During experiments, we set . Importantly, only robot arm joints which have significant impact to should be searched to speed up the calibration process. During experiments, we search the optimal pose using 2 largest joints (shoulder and elbow) and 10 angle settings to secure sufficient calibration accuracy.

V Experiments

Fig. 6: (a) The experimental setup for the tilt-type mirror reflection robot vision system. (b) The two sensor experimental setup used for comparison. (c) Close up view of the sensor mounted on the tilt-unit.

Fig. 6 shows our experimental setup and the setup is used for all experiments. The sensor is placed 2.1m above ground, where the FLIR PTR-E46 was used as the tilt-unit, one DENSO COBOTTA was used as a robot arm to create occlusions, and one commercially available mirror was placed 1.2m away from the sensor in front of the robot. We arbitrary move the robot location during experiments to confirm the robustness of our system. Note that while we chose stereo camera based EnsensoN35 as the depth sensor due to its precision, we confirmed that LiDARs and iToF cameras can be used in our reflection system as well.

V-a Object detection under occlusions

Fig. 7: The object detection pipeline is shown. The point clouds are first filtered by height to remove the robot arm and then converted to birds-eye-view images. Then, instance segmentation is performed to detect the cardboard.
Fig. 8: Qualitative results of the object detection. With only either direct or reflection sensing, occlusions still remain which cause detection errors (e.g. one box recognized as two boxes, wrong box size). The detection result with direct+reflection sensing has significantly lower detection errors and is comparable to the result using two sensors.
Dataset Sensing Strategy
Mean target data
coverage (as compared
to two sensors) [%]
70.1 83.9 96.9 100
TABLE I: The object detection accuracy and the target data coverage is evaluated on our custom dataset, where the robot continuously creates occlusions. The accuracy is significantly improved by concatenating mirror reflection sensing and the accuracy is a par with the results obtained with two sensors.

The main purpose of this experiment is to demonstrate that our proposed mirror reflection vision system 1) can diminish occlusions and 2) can achieve similar detection accuracy as using two sensors. The setup using two sensors is shown in Fig. 6 (b), where the angle of sensor 2 was carefully adjusted to effectively remove the occlusion.

Here, we evaluate the object detection accuracy of the system by modeling a cardboard box manipulation task with collaborative robots. For evaluation purposes, we create a dataset where we intentionally produce occlusion by randomly moving the robot arm over the cardboard boxes. Under such circumstances, we acquire data following four sensing strategies: 1) direct sensing only, 2) reflection sensing only, 3) direct + reflection sensing, and 4) two sensors. We collect 8 scenes with different cardboard arrangements with each scene containing 50 point cloud data. The 4 scenes have only 1 to 3 cardboard boxes (named ”easy scenes”) and the other 4 scenes containing 4 to 6 boxes (named ”hard scenes”). To investigate the effect of occlusions on the object detection accuracy, we construct an object detection pipeline for evaluation. Fig. 7 shows our object detection pipeline. Firstly, since the robot arm is not an object of interest, we filter the robot point clouds by utilizing the height threshold. Then we project the point clouds into a 2D birds-eye-view image, where the height is mapped as the image intensity. Finally, we use resnet50 Mask-RCNN [5] to conduct instance segmentation to detect the cardboard, which is fine-tuned with 800 cardboard images.

We summarize the qualitative object detection results in Fig.8 and the quantitative results in Table I. The instance segmentation mean average precision (mAP)[14] are calculated, and results for IoU=50% and IoU=75% are reported respectively. The effect of occlusions were very large with either direct or reflection sensing only and held poor accuracy. Some failure cases were: the object was completely hidden by the robot arm and overlooked, or a single object was misinterpreted as 2 objects due to cavities caused by occlusions. By the proposed direct + reflection sensing, the occlusion can be greatly reduced, boosting the detection accuracy. Importantly, our proposed method achieves a similar accuracy with results using two sensors, where the accuracy difference was only 1%. We also report the mean target data coverage respect to the two sensors result as well (computed from point cloud geometry), and we confirm that the coverage ratio correlates well with the detection accuracy.

V-B Mirror displacement calibrations

Fig. 9: (a) Robot pose used for evaluation versus number of data. Here, Pose6 was obtained by our calibration method. (b) Robot pose versus translational error. (c) Robot pose versus rotational error.
Fig. 10: We report the calibration accuracy when the mirror angle is swept to -5 to 5 degrees. By using the proposed optimal pose, the calibration converges on various conditions with high accuracy. On the right, we show point cloud data before and after the displacement calibration.

Finally, the proposed mirror displacement calibration method is evaluated by altering the mirror angles from the initial condition. Here, we follow ref.[3] and evaluate the average calibration accuracy (translational error and rotational error) of ten runs. The initial transformation matrix is set when the mirror angle is placed perpendicular to the ground, and our calibration is carried out utilizing as the initial value. For registration, we use standard RANSAC+ICP registration pipeline powered by Open3D[33].

Fig.9 reports the robot pose’s impact to calibration accuracy. Pose 6 was obtained using our proposed method, by configuring the robot pose to maximize . On the other hand, Pose 1-5 are random robot postures with different , shown for comparison. As shown in Fig.9(a), Pose 6 contains most point cloud data, and are smaller in Pose 1-5. Fig.9(b)(c) shows the calibration accuracy: Pose 1-3 which have insufficient concluded with degraded calibration accuracy. On the other hand, as increase in Pose 4,5, we saw an improvement in accuracy. The best calibration accuracy was achieved with Pose 6, showing that our -based criteria contributes to improving the calibration accuracy.

Fig.10 reports the mirror displacement calibration accuracy with various conditions, where the mirror angle was altered from -5 to 5 degrees. Here, we compare the calibration accuracy between the optimal pose (Pose6) and Pose 4. In Pose 4, the calibration convergence is unstable: for some mirror angle situations, the calibration cannot converge (mirror angle -2). On the other hand, by using the optimal pose, regardless of various mirror angle situations, high calibration accuracy was achieved.

Vi Conclusions

We proposed the first robot vision system with tilt-type mirror reflection sensing. By concatenating the direct sensing and NLoS mirror reflection sensing results, the occlusions are greatly diminished. Our signal processing pipeline detects the occlusion area and dynamically configures the sensor tilt angle, allowing the system to adaptively remove the workplace occlusion. Through experiments, we confirmed that our system can achieve the same detection accuracy as that of multiple sensors, regardless of the single-sensor implementation.

In future works, we plan to configure a cooperative robot environment with multiple robots and evaluate our proposed vision system under such scenes. Since occlusions created in such environments are more complex, multiple mirrors should be installed and techniques to select the optimal mirror upon sensing will be required. Moreover, we plan to evaluate the impact to robot grasp accuracy with the use of mirror reflections.


  • [1] A. Causo, Z. H. Chong, R. Luxman, Y. Y. Kok, Z. Yi, W. C. Pang, R. Meixuan, Y. S. Teoh, W. Jing, H. S. Tju, and I. M. Chen (2018) A robust robot design for item picking. IEEE International Conference on Robotics and Automation, pp. 7421–7426. Note: multi-camera External Links: Document, ISBN 9781538630815, ISSN 10504729 Cited by: §II.
  • [2] C. Eppner, S. Hofer, R. Jonschkowski, R. Martín-Martín, A. Sieverling, V. Wall, and O. Brock (2017) Lessons from the Amazon Picking Challenge: Four aspects of building robotic systems.

    International Joint Conference on Artificial Intelligence

    0, pp. 4831–4835.
    Note: end-effector External Links: Document, ISBN 9780999241103, ISSN 10450823 Cited by: §II.
  • [3] A. Geiger, P. Lenz, and R. Urtasun (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In

    IEEE Conference on Computer Vision and Pattern Recognition

    pp. 3354–3361. Cited by: §V-B.
  • [4] K. Harada, T. Foissotte, T. Tsuji, K. Nagata, N. Yamanobe, A. Nakamura, and Y. Kawai (2012) Pick and place planning for dual-arm manipulators. IEEE International Conference on Robotics and Automation, pp. 2281–2286. Note: dual armの作業 External Links: Document, ISBN 9781467314039, ISSN 10504729 Cited by: §I.
  • [5] K. He, G. Gkioxari, P. Dollar, and R. Girshick (2017) Mask R-CNN. IEEE International Conference on Computer Vision 2017-October, pp. 2980–2988. External Links: Document, 1703.06870, ISBN 9781538610329, ISSN 15505499 Cited by: §V-A.
  • [6] Y. He, B. Liang, Y. Zou, J. He, and J. Yang (2017) Depth errors analysis and correction for time-of-flight (tof) cameras. Sensors 17 (1), pp. 92. Cited by: §III-B.
  • [7] D. Holz, A. Topalidou-Kyniazopoulou, J. Stückler, and S. Behnke (2015) Real-time object detection, localization and verification for fast robotic depalletizing. In IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 1459–1466. Cited by: §II.
  • [8] A. S. Householder (1958) Unitary triangularization of a nonsymmetric matrix. Journal of the ACM (JACM) 5 (4), pp. 339–342. Cited by: §III-B.
  • [9] IDS Imaging Development Systems GmbH Ensenso 3D Cameras. Note: 2020-9-23 Cited by: §I, §III-B.
  • [10] P. F. Käshammer and A. Nüchter (2015) Mirror identification and correction of 3D point clouds. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 40 (5W4), pp. 109–114. External Links: Document Cited by: §II.
  • [11] J. Kim and W. Chung (2016) Localization of a mobile robot using a laser range finder in a glass-walled environment. IEEE Transactions on Industrial Electronics 63 (6), pp. 3616–3627. Cited by: §II.
  • [12] A. Kirmani, T. Hutchison, J. Davis, and R. Raskar (2009) Looking around the corner using transient imaging. In IEEE International Conference on Computer Vision, pp. 159–166. Cited by: §II.
  • [13] C. Kulpate, M. Mehrandezh, and R. Paranjape (2005) An eye-to-hand visual servoing structure for 3d positioning of a robotic arm using one camera and a flat mirror. In IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 1464–1470. Cited by: §II.
  • [14] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014)

    Microsoft coco: common objects in context

    In European conference on computer vision, pp. 740–755. Cited by: §V-A.
  • [15] D. B. Lindell, G. Wetzstein, and V. Koltun (2019) Acoustic non-line-of-sight imaging. IEEE Computer Vision and Pattern Recognition 2019-June, pp. 6773–6782. External Links: Document, ISBN 9781728132938, ISSN 10636919 Cited by: §II.
  • [16] T. Maeda, G. Satat, T. Swedish, L. Sinha, and R. Raskar (2019) Recent Advances in Imaging Around Corners. External Links: 1910.05613, Link Cited by: §II.
  • [17] E. Marchand and F. Chaumette (2017) Visual servoing through mirror reflection. In IEEE International Conference on Robotics and Automation, pp. 3798–3804. Cited by: §II.
  • [18] D. Morrison, A. W. Tow, M. McTaggart, R. Smith, N. Kelly-Boxall, S. Wade-Mccue, J. Erskine, R. Grinover, A. Gurman, T. Hunn, D. Lee, A. Milan, T. Pham, G. Rallos, A. Razjigaev, T. Rowntree, K. Vijay, Z. Zhuang, C. Lehnert, I. Reid, P. Corke, and J. Leitner (2018) Cartman: The Low-Cost Cartesian Manipulator that Won the Amazon Robotics Challenge. IEEE International Conference on Robotics and Automation, pp. 7757–7764. External Links: Document, 1709.06283, ISBN 9781538630815, ISSN 10504729 Cited by: §II.
  • [19] MUJIN, Inc. MUJIN PickWorker. Note: 2020-9-23 Cited by: §I.
  • [20] H. Noguchi, M. Handa, R. Fukui, M. Shimosaka, T. Mori, T. Sato, and H. Sanada (2012) Capturing device for dense point cloud of indoor people using horizontal lidar and pan rotation of vertical lidar with mirrors. In International Symposium on System Integration (SII), pp. 428–433. Cited by: §II.
  • [21] K. Okumura, K. Yokoyama, H. Oku, and M. Ishikawa (2015) 1 ms auto pan-tilt–video shooting technology for objects in motion based on saccade mirror with background subtraction. Advanced Robotics 29 (7), pp. 457–468. Cited by: §II.
  • [22] Righthand Robotics RightPick2: The Robotic Piece-Picking Solution for Intralogistics. Note: 2020-9-23 Cited by: §I.
  • [23] N. Scheiner, F. Kraus, F. Wei, B. Phan, F. Mannan, N. Appenrodt, W. Ritter, J. Dickmann, K. Dietmayer, B. Sick, et al. (2020) Seeing around street corners: non-line-of-sight detection and tracking in-the-wild using doppler radar. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2068–2077. Cited by: §II.
  • [24] M. Schwarz, C. Lenz, G. M. Garcia, S. Koo, A. S. Periyasamy, M. Schreiber, and S. Behnke (2018) Fast object learning and dual-arm coordination for cluttered stowing, picking, and packing. IEEE International Conference on Robotics and Automation, pp. 3347–3354. Note: dual armの作業 External Links: Document, 1810.02977, ISBN 9781538630815, ISSN 10504729 Cited by: §I.
  • [25] Toshiba Corp. Logistics System Solutions. Note: 2020-9-23 Cited by: §I.
  • [26] Velodyne Velodyne LiDAR Products. Note: 2020-9-23 Cited by: §I.
  • [27] S. Yang and C. Wang (2008) Dealing with laser scanner failure: mirrors and windows. In IEEE International Conference on Robotics and Automation, pp. 3009–3015. Cited by: §II.
  • [28] S. Yang and C. Wang (2010) On solving mirror reflection in lidar sensing. IEEE/ASME Transactions on Mechatronics 16 (2), pp. 255–265. Cited by: §II.
  • [29] K. Yoshioka et al. (2018) A 20-ch TDC/ADC Hybrid Architecture LiDAR SoC for 240x96 Pixel 200-m Range Imaging With Smart Accumulation Technique and Residue Quantizing SAR ADC. IEEE Journal of Solid-State Circuits 53 (11), pp. 3026–3038. Cited by: §I, §III-B.
  • [30] J. S. Yun and J. Y. Sim (2018) Reflection Removal for Large-Scale 3D Point Clouds. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4597–4605. External Links: Document, ISBN 9781538664209, ISSN 10636919 Cited by: §II.
  • [31] A. Zeng, S. Song, K. T. Yu, E. Donlon, F. R. Hogan, M. Bauza, D. Ma, O. Taylor, M. Liu, E. Romo, N. Fazeli, F. Alet, N. Chavan Dafle, R. Holladay, I. Morona, P. Q. Nair, D. Green, I. Taylor, W. Liu, T. Funkhouser, and A. Rodriguez (2019) Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. International Journal of Robotics Research, pp. 3750–3757. Note: multi-camera External Links: Document, ISBN 9781538630815, ISSN 17413176 Cited by: §II.
  • [32] A. Zeng, K. T. Yu, S. Song, D. Suo, E. Walker, A. Rodriguez, and J. Xiao (2017)

    Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge

    IEEE International Conference on Robotics and Automation, pp. 1386–1393. Note: エンドエフェクタでマルチアングルから見る External Links: Document, 1609.09475, ISBN 9781509046331, ISSN 10504729 Cited by: §II.
  • [33] Q. Zhou, J. Park, and V. Koltun (2018) Open3D: A modern library for 3D data processing. arXiv:1801.09847. Cited by: §V-B.