1 Introduction
Scene flow is defined as a 3D vector field that provides a low level representation of 3D motion. It is analogous to optical flow, which describes the pixel movements on a 2D image plane. Optical flow can be considered the projection of scene flow into 2D. Applications such as object detection, object tracking, point cloud registration, correspondence estimation and motion capture can benefit from this lowlevel information for better performance.
Although the vector field representation is simple, scene flow estimation is far from an easy task. This is due to the requirement of accurate depth estimation and also the need to deal with occlusion. Traditionally, scene flow estimation is computed by optimising photometric error [22], or through matching handcrafted features [3]
, each applied over multiple view geometry or RGBD images. With the fast development of deep learning, some works bring CNN to scene flow estimation. This allows for the scene flow estimation to benefit from the semantic information and powerful deep feature extraction. Recently, PointNet
[5] and PointNet++ [29] enabled the direct point cloud processing for deep learning. These works are particularly interesting since they are point cloudbased networks that can directly process the rich 3D geometric information, rather than implicitly learning 3D geometry from 2D images. Built on top of PointNet++, FlowNet3D [22] tackles the scene flow estimation problem on point cloud directly, achieving the stateofart scene flow estimation results. Despite the impressive results of FlowNet3D, point cloud based scene flow estimation is still at the very beginning of its development. FlowNet3D trains the network with a naive supervision signal, which is the loss between predicted flow and ground truth vectors.In this work, we apply geometric principles from classical point cloud registration algorithms in order to mature deep scene flow estimation beyond the simple norm between prediction and ground truth. In particular, we investigate two geometric constraints including: 1) pointtoplane distance and 2) the cosine distance between predicted flow vector and ground truth vector. The pointtoplane distance is a common loss term in Iterative Closest Point (ICP) [2, 26] algorithm, which is known for fast convergence. Cosine distance can penalise the angle between two vectors directly. As a result of this, cosine distance encourages correct alignment of our predicted scene flow vectors, not only that they lie on the norm surface. The application of these geometrically principled constraints not only demonstrates improved accuracy over the stateofart, but also improved convergence speed and stability of training.
Further, we introduce a novel benchmark for investigating the practical performance of scene flow estimators, through the proxy task of dynamic 3D reconstruction. Application of scene flow to dynamic reconstruction provides a holistic combination of the individual metrics previously used to evaluate 3D flow estimation. We contribute a novel pipeline for integrating pointbased scene flow into a global dense volume.
To summarise, our main contributions are:

We improved the stateofart point cloudbased scene flow estimation accuracy from 57.85% to 63.43% by combining pointtoplane loss, cosine distance loss with loss.

We propose an average angle error metric to evaluate the flow direction deviation to supplement End Point Error (EPE), which is not sufficient alone to evaluate the angle difference between two vectors.

We propose dynamic 3D reconstruction, alongside a novel dynamic integration pipeline, as a benchmarking task for scene flow estimation. 3D reconstruction provides a holistic and inherently geometric measure of flow estimation performance. Within our deformable scene flow benchmark task, our FlowNet3D++ achieves up to 15.0% less reconstruction error than FlowNet3D, and up to a 35.2% improvement over KillingFusion.
The remainder of the paper is organised as follows: in Section 2, we briefly describe related work. Section 3 describes our modifications to FlowNet3D, as well as our method to integrate the sparse scene flow vector field in dense dynamic reconstruction. In the experiment section, we evaluate the effect of adding different geometric constraints to scene flow estimation and our dynamic reconstruction results with several public datasets. Section 6 concludes our work and describes potential future work.
2 Related Work
2.1 Tradition Scene Flow Estimation
Scene flow is a lowlevel representation of 3D motion of points within a scene. It is a 3D extension from the 2D optical flow [3], which itself describes the pixel movements on a 2D image plane. Many works have focused on estimating scene flow using multiview geometry [35] by associating salient image key points. Later works [27, 11, 37, 36]
tackle this problem with joint variational optimisation of image registration and motion estimation.
[37] compute dense scene flow from stereo cameras and achieved 5fps on a CPU. SphereFlow [10, 15] is the first realtime scene flow estimation system using RGBD input. [16] proposed to process rigid and nonrigid segments differently.2.2 Deep Flow Estimation
The recent development in deep neural networks provides an alternative to address the problem of associating points over deformed depth maps. One group of deep methods can be viewed as the successors of the classic 2D optical flow methods. For example, FlowNet
[7] and its variants [12]. Instead of using handcrafted feature for tracking pixel locations, these methods rely on learned deep features for tracking and then backproject into depth maps to fetch the 3D scene flow. For better training and evaluation, Mayer et al [24] created three synthetic scene flow datasets. They also proposed a network for disparity and scene flow estimation. [30] assume a dynamic scene contains foreground objects and background and apply instance segmentation masks over foreground to treat foreground and background differently.Similarly, [31] developed a neural network that jointly estimates object segmentation, trajectories of objects, and the object scene flow from two consecutive RGBD frames. Ilg et al. [13] proposed a network based on FlowNet [7] to estimate occlusions and disparity together. [23] integrates three vision cues to estimate scene flow for rigid objects in selfdriving tasks. The three vision cues are segmentation masks, disparity map, and optical flow and they are extracted by existing networks, i.e. Mask RCNN [9], PSMNet [4], and PWCNet [34].
All the above approaches are mostly imagebased so that appearance features can be conveniently extracted using 2D convolution. However, some sensory data such as laser scanners is unstructured and therefore conventional convolution is not applicable. To address the problem, Behl et al [1] evaluated the performance of scene flow estimation when integrating bounding box and integrating pixelwise segmentation to scene flow estimation pipeline. [1, 21, 8] are designed for scene flow estimation on the point cloud. PointFlowNet [1] proposed to estimate scene flow, egomotion and rigid object motion at the same time. In comparison to the PointFlowNet, FlowNet3D [21] and HPLFlowNet [8] are more general scene flow estimation frameworks that do not rely on rigid object assumption. More specifically, FlowNet3D extracts features with PointNet++ [29], mixes features and computes a coarse scene flow using a flow embedding layer, and propagates coarse scene flow to finer level using a setupconv layer. HPLFlowNet, instead of using PointNet++, states that using permutohedral lattice[20] and Bilateral Convolutional Layer (BCL) [17] can improve global information extraction and faster performance.
3 Method
FlowNet3D is a neural network for estimating 3D scene flow given two point clouds, namely the source point cloud and the target point cloud , where and are two sets of unordered 3D points. For generality, the numbers of both point clouds do not have to be identical, i.e. , but the predicted vector field always has the same dimension as the source point cloud. FlowNet3D adopts the Siamese architecture that first extracts downsampled point features for each point cloud using the PointNet++, and then mixes the features in the flow embedding layer. In the end, the output features of the flow embedding are imposed with the regularisation and upsampled into the same dimensionality as the
. The network is trained using the loss function of the
norm between predicted where is the groundtruth scene flow field.FlowNet3D has been successfully applied in rigid scenes. In this paper, we further explore the potential of it when applied to nonstatic scenes, even the scenes dominated by deformable objects. More importantly, we introduce two loss terms that improves the accuracy of the prediction in both dynamic scenes while maintains the performance in rigid scene (measured using KITTI dataset). The new loss term also speeds up and stabilises the training procedure. Fig 1 illustrates the general idea of FlowNet3D and loss terms we applied. More details on the original FlowNet3D structure can be found in its paper.
3.1 Geometric Constraints
PointtoPlane Loss is inspired by the popular pointtoplane distance metric for point cloud registration, such as the Iterative Closest Point (ICP) algorithm. Specifically, we can use the set and to represent two 3D point clouds at the frame, where the labels and represent the live camera and the world coordinate system, respectively. Each point is a 3D homogeneous coordinate. The point can be transformed from the world coordinate into the camera coordinate using , where is a 3D rigid transformation.
Given and , can be estimated by minimising the following error function from a typical ICP algorithm [26]:
(1) 
where is the function to calculate the surface normal at . is the closet point to . The dot product between the surface normal and the closet distance measures the distance from to the plane defined by and its normal, hence it is known as the pointtoplane metric.
Inspired by the pointtoplane metric, we introduce a new loss for training the FlowNet3D, which is defined as follows:
(2) 
where is the closest point in the target set to the source point . The scene flow may encode any rigid transformation or simply the nonstatic motion field, which is ultimately determined by the the samples provided during training. During training on FlyingThings, both and are in the same coordinate system and therefore, the trained model naturally learns to represent segments of rigid motion fields. Interestingly, we found that the same model can generalise to the point clouds extracted from consecutive frames of a deforming object so well that it outperforms the stateoftheart dynamic fusion algorithm.
Cosine Distance Loss aims at constraining the angle between predicted flow field and the ground truth. From the scene flow predictions of FlowNet3D, we noticed that some of the predicted motion vectors differ greatly in direction from the groundtruth. As a result, we introduce the cosine distance loss which aims to minimise the angle between prediction and ground truth. We compute the cosine distance directly between a predicted vector and its groundtruth. This provides extra penalisation to vectors with directions with deviate from the groundtruth, even if they have the same loss. Fig 2 illustrates the effect of applying loss and cosine distance together.
Combined Loss includes all three different loss terms using a weighted summation:
(3) 
where and are the weight to balance among the loss terms, the and . The is the predicted vector field and is an individual ground truth vector corresponding to . It is worth noting that the cosine loss and loss weigh over the angles and lengths of the predicted vector field, respectively.
3.2 Scene Flow for Dynamic 3D Reconstruction
The performance of flow field estimation on a dynamic scene is normally evaluated through the counting of inliers, which are determined through a set threshold. However, this evaluation scheme depends heavily on the threshold, which must be set heuristically. We propose to benchmark scene flow framework based on a stateoftheart dynamic 3D reconstruction system, so that the scene flow can be evaluated by viewing a 3D model. This provides a more holistic performance measure, as well as a practical application for 3D flow estimation.
Dynamic 3D reconstruction is recently introduced for recovering nonstatic objects, including deformable objects such as moving animals or human beings [25, 14, 32, 33, 18, 19]. KillingFusion [32] and its variant SobolevFusion [33] represent the stateoftheart dynamic fusion method directly estimating a dense vector field between two TSDF volumes. However, this variational optimisation process is easily trapped in local minimum when the search space is large. Our benchmark framework, which can also be considered as a dynamic reconstruction system, significantly outperforms KillingFusion in terms of quality by a 35.2% reduction in mean error.
Particularly, our benchmark framework takes in a sequence of point clouds with corresponding scene flow predictions to recover a 3D model. The reconstruction error can be visualised comparing with the ground truth model. In experiments, the FlowNet3D++ reduces up to 15.0% error in the dynamic reconstruction task compare to FlowNet3D.
Overview of the whole pipeline is illustrated in Fig.1 and we show the essential steps below:

Predict the scene flow between the live and the canonical point cloud. In this paper we experimented the FlowNet3D[22] and our FlowNet3D++.

Warp the live point cloud using the scene flow computed in the last step and create the synthetic depth map by projecting the warped live point cloud into the compensated camera pose.

Construct a live TSDF volume from the synthetic depth map using the widely used depth to volume integration method, which is first introduced in KinectFusion [26].

Refine the vector field between the live volume and canonical volume using a simple variational voxel based scene flow refinement.

Update the by taking the voxelwise weighted average between and for the TSDF values live and accumulate the weight [6].
Step 2 introduces the deep scene flow to warp the live point cloud so that a virtual TSDF volume that is much easier for the KillingFusion to optimise and therefore reduces the computation complexity and quality of the recovered model. Step 3, 4 and 5 formulate our novel scene flow integrator that integrates a scene flow in point cloud resolution to the full TSDF volume resolution with very little artefacts.
Scene Flow Integrator merges multiple point cloud into a single 3D volumetric representation from which the 3D model can be extracted. Specifically, assuming represents point cloud of live frame in camera coordinate and represents the raycasted point cloud from the canonical model, the scene flow predictor computes a scene flow field that associate with . The warping from to can be formulated as follows:
(4) 
where . Note that and share the same resolution and and are different. Therefore, our target is to integrate into the canonical volume smoothly.
Naively integrating into the global TSDF volume seems a reasonable solution, however, in our experiments we discover this causes significant artefacts. This is because scene flow computed on point clouds is only capable of inferring the motion on the object surface or the zero level set. To deform a volumetric TSDF, the vector field has to cover the entire 3D region within the truncated area while maintaining the property of TSDF to be a precise level set function so that the artefacts are minimised.
Therefore, we tackle the problem by: 1) creating a synthetic depth map , where is a set of pixel locations on the depth map, by projecting onto the depth map . 2) creating a synthetic live volume from . This is equivalent to integrating a depth map to an empty TSDF volume.
By converting the deformed point cloud into a TSDF volume , we have acquired a coarse alignment between the and . The next step is to refine this coarse alignment with a simple variational vector field refinement.
Voxel Based Vector Field Refinement: The concept of running variational optimisation directly on TSDF volume was first introduced in the KillingFusion [32] and simplified in SobolevFusion [33]. It solves the vector field by evolving the source TSDF into target TSDF iteratively. This approach enjoys the advantage of being capable of dealing with topological changes but a drawback of this variational SDF evolution lies in that it can easily get trapped in some local minima. This is because it lacks explicit correspondences associating level set functions. By providing a good initial solution from our deep scene flow estimator, only a few iterations of voxel based vector field refinement is needed. Specifically, for a voxel at position and a 3D vector associates with this voxel, our energy is simply defined as:
(5) 
where represent TSDF value at voxel centre and the energy can be optimised using gradient descent easily:
(6) 
where represents the vector field at its iteration and is the gradient with respect to the and can be computed efficiently using the following calculus of variations:
(7) 
where the is the spatial gradient at the voxel position in the live volume .
It is worth noting that the vector field computed from above optimisation is only meaningful in local regions and the purpose is twofold: (i) to register a roughly aligned live volume to canonical model; (ii) to remove artefacts introduced in the coarse nonrigid point cloud registration. Thanks to the quality of the deep scene flow estimator, we no longer require a regularisation term, such as those in KillingFusion and SobolevFusion.
The above energy will produce a scene flow vector for each SDF voxel. In general, the magnitude of this vector field should be small because the main evolution has already been compensated when warping the live point cloud to the canonical point cloud. As a result, with a small number of iterations, typically ranging from 3 to 70, we can mediate the artefacts and noise introduced in from scene flow predictor.
We are aware that having variational refinement may affect the deep scene flow benchmarking result. However, this variational refinement is necessary for complex tasks like dynamic reconstruction. If not present, the tracking can fail after a few frames due to the large accumulated error. To eliminate the effect of this variational refinement in benchmarking, we explicitly set a fixed iteration number for all experiments. For the Snoopy and Duck dataset, we use 30 iterations for all deep scene flow benchmarking.
4 Experiments
In this section, we evaluate our modifications to FlowNet3D and validate their effectiveness quantitatively in two subsections. In the first subsection, we benchmark our FlowNet3D++ result using the existing scene flow datasets FlyingThings and KITTI, which are preprocessed and provided by FlowNet3D. For preprocessing details, we refer reader to the FlowNet3D supplementary material. We also provide a graph to analyse the time taken for our training to converge. In the second subsection, we quantitatively evaluate the performance of FlowNet3D++ in our novel dynamic reconstruction benchmark. This is performed on two reconstruction datasets (Snoopy and Duck), both of which are provided by KillingFusion[32]. Further qualitative results can be found in our supplementary material.
To enable the pointtoplane loss term in Eq. 2, we also precompute perpoint surface normal for the FlyingThings dataset but we do not use surface normals as input features.
Our model is trained from scratch using the training split of FlyingThings dataset and testing is performed on the test split. We directly transfer our model that was trained on FlyingThings to KITTI without any finetuning. For the dynamic reconstruction benchmark, we also directly deploy the model that was trained on FlyingThings dataset to the pipeline, again without finetuning. For hyperparameters, in most experiments we use exactly same hyperparameters the FlowNet3D used to show the effectiveness of our loss terms. For the best result we show in Table 1
, we trained 200 epochs.
4.1 Metrics
We report our results using 3D EndPointError (EPE) and an accuracy metric (ACC) with two thresholds. These three metrics are also used in FlowNet3D to provide fair comparison. We also propose the average angle deviation error (ADE) for this task for the evaluation of the predicted scene flow vectors’ direction. EPE: the EPE is the norm between an estimated flow vector and its ground truth vector. ADE: we define the ADE as , where and are predicted vector and its ground truth vector.
4.2 FlyingThings Dataset

Model 


EPE 



xyz  F3D  23.71%  56.05%  0.1705  22.83  
F3D++  28.50%  60.39%  0.1553  20.78  
rgb  F3D  25.37%  57.85%  0.1694  22.58  
F3D++  30.33%  63.43%  0.1369  21.14 
In FlowNet3D++, we apply both the cosine distance loss and pointtoplane loss alongside the original loss. The results listed in Table 1 show that our modifications improve all metrics that we test. In fact, the geometriconly XYZFlowNet3D++ even outperforms RGBFlowNet3D, which is allowed to incorporate colour information. We use and for this test, but we found generates good results in the general case. As the FlowNet3D did not evaluate ADE, we compute FlowNet3D’s ADE with the pretrained model provided by [22].
4.3 KITTI Dataset
As KITTI scene flow dataset only provides a colourless, LiDARscanned point cloud, we only show the results for geometryonly models.
Model  Outlier  EPE  ADE 
F3D (with our eval script)  7.53%  0.3259  42.60 
F3D++  4.81%  0.2530  36.86 
We propose a more simple evaluation procedure on the KITTI dataset than was used in [22]. Instead of cutting the KITTI point cloud into numerous chunks and having to deal with overlapping regions, we resize the KITTI dataset to the size of FlyingThings scenes, which is , before feeding it to networks. Although this produces differing results than in [22], we ensure a fair comparison by training both FlowNet3D and FlowNet3D++ on FlyingThings and transferring to our resized KITTI without finetuning. We report our results in table 2.
4.4 Dynamic Dense Reconstruction
In this section, we demonstrate the effectiveness of FlowNet3D++ within our proposed dynamic dense reconstruction benchmark.
4.4.1 Configuration
Our depthonly dynamic reconstruction system is implemented on top of InfiniTAM [28]
, an open sourced RGBD dense SLAM system with modern CUDA support. The volume resolution is set as
and voxel size 3 mm or 5 mm for all of our experiments. Specifically, we use 3mm for small scenes like Snoopy and Duck dataset [32] and 5 mm for the VolumeDeform datasets [14]. The truncated distance is set to times of the voxel size. The step size for optimiser is set to . We also implement a SobolevFusion system for comparison (comparison images can be found in Appendix). Similar to the KITTI scene, applying FlowNet3D++ to videos that captured with different cameras requires the scene to be resized to the range of FlyingThings dataset, i.e. . The choice of scaling factors depends on different voxel size, SDF volume size and camera intrinsics. However, in practice, we found a rough estimation of the scaling factors works well for all the experiments. In particular, for the Snoopy sequence, the scaling factors are set as . The good results achieved through this resizing method in dynamic reconstruction provide evidence that the resizing in KITTI evaluation is also valid.4.4.2 Results
The KillingFusion dataset (Snoopy and Duck) provides a ground truth mesh. Thus, we can quantitatively analyse the benefit of adding deep scene flow estimation to the dynamic reconstruction. We also present more images of running our systems on VolumeDeform dataset and a video sequence we record by ourselves in Appendix, to illustrate the benefit of our pipeline qualitatively. Our Snoopy and Duck evaluation result is reported in Table 3 and Fig 3.
Mean Error To Ground Truth (mm)  
Scenes 






Snoopy  4.205  3.543  2.348  2.297  
Duck  5.362  3.896  3.012  2.561 
5 Ablation study
To validate the individual benefit derived from each of our geometric constraints, as well as their combination, we perform ablation tests for both the geometriconly models and colour models. Unless otherwise stated, we use exactly same training procedure as described in [21].
Results are shown in Table 4 and Table 5. The results in the bottom rows of Table 4 and Table 5, we trained for 200 epochs, instead of 150 epochs.
In addition to the overall performance of the geometric loss terms, it is also worth noting that in the RGB setting, simply combining + does not yield the best ACC and EPE after 150 epochs of training. Instead, the best result acquired after this schedule is the model trained with . However, we found that the accuracy of FlowNet3D + plateaus after 150 epochs, whereas the accuracy of model with both and still grows until 200 epochs. Therefore, the combination of geometric losses in the RGB setting provides our best configuration. In the XYZ setting, however, the combination of geometric loss terms provides the best result, even after the 150 epoch schedule.



EPE 


F3D  23.71%  56.05%  0.1705  22.83  
F3D +  27.79%  60.06%  0.1567  21.96  
F3D +  25.30%  58.15%  0.1615  21.17  
F3D + +  28.22%  60.11%  0.1556  20.75  
F3D + +  28.50%  60.39%  0.1553  20.77 



EPE 


F3D  25.37%  57.85%  0.1694  22.58  
F3D +  28.52%  62.75%  0.1391  21.74  
F3D +  26.84%  61.57%  0.1454  20.96  
F3D + +  26.05%  60.53%  0.1492  21.27  
F3D + +  30.33%  63.43%  0.1369  21.14 
6 Conclusion
In this paper, we introduced FlowNet3D++, which to the best of our knowledge is the stateofart point cloudbased deep scene flow estimator. We contribute two principled geometric constraints that each improve the accuracy of the stateofart of point cloud based deep scene flow from 57.85% to 63.43%. We also contribute a novel geometric based scene flow benchmark pipeline in dynamic reconstruction context. Within our deformable scene flow benchmark, our FlowNet3D++ achieves up to 15.0% less reconstruction error than FlowNet3D, and up to a 35.2% improvement over KillingFusion alone.
Acknowledgements
We gratefully acknowledge the European Commission Project MultipleactOrs Virtual Empathic CARegiver for the Elder (MoveCare) grant for financially supporting the authors of this work.
References

[1]
(2019)
PointFlowNet: Learning Representations for Rigid Motion Estimation from Point Clouds.
In
Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR)
, Cited by: §2.2.  [2] (2013) Sparse Iterative Closest Point. Wiley Computer Graphics Forum (CGF) 32 (5), pp. 113–123. Cited by: §1, item 1.
 [3] (2010) Large Displacement Optical Flow : Descriptor Matching in Variational Motion Estimation. IEEE Trans. Pattern Anal. Machine Intell. (PAMI). Cited by: §1, §2.1.
 [4] (2018) Pyramid Stereo Matching Network. In Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.2.
 [5] (2017) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 77–85. External Links: Link Cited by: §1.
 [6] (1996) A volumetric method for building complex models from range images. In Proceedings of ACM Special Interest Group on GRAPHics (SIGGRAPH), pp. 303–312. Cited by: item 6.
 [7] (2015) FlowNet: Learning Optical Flow with Convolutional Networks. In Proceedings of Intl. Conf. on Computer Vision (ICCV), Vol. 49, pp. 78–84. Cited by: §2.2, §2.2.
 [8] (2019) HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Largescale Point Clouds. In Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.2.
 [9] (2017) Mask RCNN. In Proceedings of Intl. Conf. on Computer Vision (ICCV), Cited by: §2.2.
 [10] (2014) SphereFlow: 6 DoF scene flow from RGBD pairs. Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR). Cited by: §2.1.
 [11] (2007) A Variational Method for Scene Flow Estimation from Stereo Sequences. In Proceedings of Intl. Conf. on Computer Vision (ICCV), Cited by: §2.1.
 [12] (2017) Flownet 2.0: evolution of optical flow estimation with deep networks. In Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2462–2470. Cited by: §2.2.
 [13] (2018) Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §2.2.
 [14] (2016) VolumeDeform: Realtime Volumetric Nonrigid Reconstruction. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 362–379. Cited by: §3.2, §4.4.1.
 [15] (2015) A primaldual framework for realtime dense RGBD scene flow. In Proceedings of IEEE Intl. Conf. on Robotics and Automation (ICRA), Cited by: §2.1.
 [16] (2015) Motion Cooperation: Smooth Piecewise Rigid Scene Flow from RGBD Images. In Intl. Conf. on 3D Vision (3DV), Cited by: §2.1.
 [17] (2016) Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks. In Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.2.
 [18] (2017) Panoptic studio: a massively multiview system for social interaction capture. IEEE Trans. Pattern Anal. Machine Intell. (PAMI). Cited by: §3.2.
 [19] (2018) Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies. In Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 8320–8329. Cited by: §3.2.
 [20] (2015) Permutohedral Lattice CNNs. In Proceedings of Intl. Conf. on Learning Representations (ICLR), Cited by: §2.2.
 [21] (2019) FlowNet3D: Learning Scene Flow in 3D Point Clouds. Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR). Cited by: §2.2, §5.
 [22] (2019) FlowNet3D: Learning Scene Flow in 3D Point Clouds. In Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), Cited by: FlowNet3D++: Geometric Losses For Deep Scene Flow Estimation, §1, item 2, §4.2, §4.3.
 [23] (2019) Deep Rigid Instance Scene Flow. In Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.2.
 [24] (2016) A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR). Cited by: §2.2.
 [25] (2015) DynamicFusion: Reconstruction and Tracking of Nonrigid Scenes in RealTime. In Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), Cited by: §3.2.
 [26] (2011) KinectFusion : RealTime Dense Surface Mapping and Tracking. In Proceedings of IEEE/ACM Intl. Symposium on Mixed and Augmented Reality (ISMAR), Cited by: §1, item 4, §3.1.
 [27] (2007) Multiview stereo reconstruction and scene flow estimation with a global imagebased matching score. Intl. Journal of Computer Vision (IJCV) 72. Cited by: §2.1.
 [28] (2017) InfiniTAM v3: a framework for largescale 3d reconstruction with loop closure. arXiv preprint arXiv:1708.00783. Cited by: §4.4.1.
 [29] (2017) PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of IEEE Conf. on Neural Information Processing SystemsNatural and Synthetic (NIPS), pp. 2–6. Cited by: §1, §2.2.
 [30] (2017) Cascaded Scene Flow Prediction Using Semantic Segmentation. In Intl. Conf. on 3D Vision (3DV), Cited by: §2.2.
 [31] (2018) MotionBased Object Segmentation Based on Dense RGBD Scene Flow. IEEE Robotics and Automation Letters 3. External Links: ISSN 23773766 Cited by: §2.2.
 [32] (2017) KillingFusion: Nonrigid 3D Reconstruction without Correspondences. In Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 5474–5483. Cited by: FlowNet3D++: Geometric Losses For Deep Scene Flow Estimation, §3.2, §3.2, Figure 3, §4.4.1, §4.
 [33] (2018) SobolevFusion: 3D Reconstruction of Scenes Undergoing Free Nonrigid Motion. In Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2646–2655. Cited by: §3.2, §3.2.
 [34] (2018) PWCNet: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. In Proceedings of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.2.
 [35] (1999) Threedimensional scene flow. In Proceedings of Intl. Conf. on Computer Vision (ICCV), Vol. 2, pp. 722–729. Cited by: §2.1.
 [36] (2011) 3D scene flow estimation with a rigid motion prior. In Proceedings of Intl. Conf. on Computer Vision (ICCV), Cited by: §2.1.

[37]
(2008)
Efficient dense scene flow from sparse or dense stereo data.
In
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
, Cited by: §2.1.
Comments
There are no comments yet.