1 Introduction
Understanding motion is fundamental to many applications in a variety of fields, such as humancomputer interaction, robotics, and autonomous driving. The information absorbed within a temporal window is not only a collection of images or a representation of an outcome, but also a description of a process.
Decades ago, computer vision tackled the task of motion estimation, searching for a flow between two images [4, 7, 14, 22, 41].
One significant leap forward in understanding the motion of a scene, defined as scene flow, is the presence of 3D geometry. It liberates us from considering color as the main correspondence feature and allows examining the structure itself to understand the motion. Axiomatic concepts of rigidity [3, 6] provided fast and accurate results, but once piecewise movements [9, 10, 39] or nonrigidity [1, 15, 23] was allowed, scene flow estimation problem became illposed and unfortunately hard to solve.
The rise of artificial intelligence
[19] gives hope that solving the 3D flow estimation problem is possible using a network architecture. Indeed, in the last few years, an improvement in different learningbased methods [12, 21, 31, 37, 43] has been seen, outperforming those that relied on optimization. More importantly, these learned models are fast and robust.One of the most famous clients of scene flow methods is the autonomous driving industry, where LiDAR data is used for the perception of the environment. However, LiDAR sensors suffer from sparseness, directly affecting deeplearning flow algorithms that require knowledge of the objects’ spatiotemporal neighborhood. In other words, once the structures do not heavily overlap, the process fails. In an attempt to solve this limitation, we recently have seen alltoall mechanisms both for images
[36] and geometry [31]. However, these methods consume large amounts of memory and tend to produce outliers, as now nearby points can be aligned with inconsistent temporal positions.
In this work, we focus on the scene flow problem, where large deviations between the scenes can occur. A small set of points is used to guide the alignment in an alltoall approach, and a recurrent refinement block is then unrolled to learn movement differentiators. We train our network to predict a single step at a time and converge iteratively toward the end flow solution, as illustrated in Fig.1. Although unrolled for K iterations during training, our network can be used for inference with a larger number of iterations to handle more significant and complicated deformations.
Trained on synthetic data only, our method improves the stateoftheart results on the selfsupervised KITTI benchmark by a considerable margin. Our architecture is further tested in a fullysupervised framework and achieves slightly better results compared to prior art while benefiting from memory efficiency.
The key contributions of this work are as follows:

We present the first recurrent architecture for nonrigid Scene Flow.

We provide a slim memory alltoall correlation pipeline by merging lowresolution correlation with an unrolling iterative refinement process.

Our proposed network achieves large improvements over existing selfsupervised methods on both FlyingThings3D and KITTI benchmarks.
2 Related Work
Scene Flow Estimation on Point Clouds. Scene flow estimation was first introduced in [38], who suggested to compute a 3D scene flow from 2D optical flow using a linear algorithm. Later approaches used stereo sequences [16], RGBD [13], and LiDAR [6]. With the rise of new methods for deep learning on point clouds [32, 33, 35, 42] and the increasing popularity of range data at the autonomous driving domain, more recent approaches suggest learning the 3D scene flow directly from the raw data spatial positions.
Liu [21] were the first to introduce a correlation layer that aggregates features of different point clouds based on PointNet[32]. However, the correlation layer was applied at a particular scale, only capturing the correlation of that specific level of features between the point clouds and a fixed neighborhood radius, allowing a small deformation between the point clouds. Gu [12] tackled those limitations by introducing multiresolution correlation layers and suggested using a Bilateral Convolutional layer [17, 18, 35]. Inspired by classical pyramid approaches, Wu [43] further improved multi resolution flows by applying it in a coarse to fine manner and showed superior results. However, multiresolution methods require many learnable parameters and are limited to deformations smaller than the correlation neighborhood. In [37], the authors suggested splitting the movement into rigid egomotion and nonrigid refinement components, relying on the same architecture as [12]. A different approach focusing on alltoall correlation suggested by [31], using optimal transport tools to estimate the scene flow, showed excellent results. However, an alltoall correlation matrix for a large scale point cloud is inefficient.
We adopt the alltoall correlation concept, but unlike [31], we suggest to use it efficiently in a much deeper, lower resolution space.
Selfsupervised Learning.
Learning to estimate scene flow in a selfsupervised manner is an active field of research. Mittal [29] showed that cycleconsistency and nearestneighbor losses could be used for selfsupervision of scene flow learning, using a backbone of FlowNet3D [21], pretrained in a fully supervised manner on synthetic data. Tishchenko [37] combined the same selfsupervised losses with a fully supervised loss into a ’Hybrid loss’, while [43] suggested a fully selfsupervised process, combining Chamfer, nearestneighbors, and laplacian losses.We follow [43] and choose the Chamfer loss as the dataterm loss of our training. Still, inspired by classical nonrigid alignment algorithms, we claim that this data term is not sufficient for a oneshot, endtoend solution. Hence we suggest an iterative approach for the scene flow estimation and emphasize the need for strong regularization loss term.
Algorithm Unrolling.
While the vast majority of deep learning approaches propose a purely datadriven, oneshot solution, there is a rising trend of combining iterative algorithms to neural network architectures to take advantage of both learning and prior knowledge. Recent works showed promising results for signal and image processing tasks
[8, 20, 25, 28, 30, 44, 45] by unrolling either an explicit iterative solution for an energy minimization problem or a model. A contemporary approach named RAFT [36] suggested model unrolling for 2D optical flow estimation, performing lookups on a 4D alltoall correlation volume.We suggest to unroll a singlestep flow estimation model. Inspired by [36]
, we also adopt the idea of using a gated recurrent unit for iterative updates. An essential concept of our method, which is different from
[36], is the computation of new features for the warped scene at every iteration. It is necessary since all point cloud convolution methods are not rotation invariant, so the features of the source change as it is being rotated toward the target. We consider this process as a critical component to learning differentiators iteratively.3 Problem Definition
Scene flow is the 3D motion field of points in a scene. For a given two sets of points and , sampled from a dynamic scene at two consecutive time frames, we denote by
the translational motion vector of a point
from the first frame toward its new location in the second frame. Our goal is to estimate the scene flow , that describes the best nonrigid transformation which aligns toward . Of note, due to both the sparsity of the 3D data and possible occlusions, a point may not be presented in . Therefore, we do not learn the correspondence between and , but a flow representation for each point .In general, every point , may have additional information such as color or geometric features. The number of points in the source may differ from the number of points in the target, i.e., and are not necessarily equal.
4 Architecture
We suggest an iterative system (Fig. 3) that predicts a flow sequence , where is our final flow estimation. First, we use a global correlation unit (Sec. 4.2) to guide the alignment in an alltoall approach. Next, we unroll a local update unit (Sec. 4.3), to learn movement refinements. Our local update unit implements a single conceptual iteration of an IterativeClosestPoint (ICP) algorithm, replacing the two phases (a. finding correspondence and b. estimating the best smooth transformation based on that correspondence) by learned components.
The number of iterations K is a hyperparameter and can be larger during inference than during training to handle more complicated and large deformations, as discussed in Sec. 6.3.
4.1 Local And Global Features Encoding
Local features of a point encode the geometric features of its relatively small neighborhood and are useful for local alignment refinements. On the other hand, global features capture highlevel information regarding the relative position of the point in the scene, using a larger receptive field and deeper encoding. A crucial part of our method is the distinction between the local and the global features of a point cloud.
We use the set_conv layer suggested by FlowNet3D [21] as our convolution mechanism and furthest_point_sampling method for downsampling. Our local encoder consists of only two set_conv layers, capturing a relatively small receptive field, so that its output encodes an input point clouds shallow features of dimension , at resolution . Local encoding is first applied on both source and target input point clouds to produce , , and then applied again at every iteration on the warped source point cloud producing .
In order to extract global features, the local features descriptors , are injected into an additional encoder , which produces and , a deeper representation of and of dimension and resolution , which we denote as , to ease notations. Both and encoders have shared weights across all their appearances.
4.2 Global Correlation Unit
We use a global correlation unit to estimate the initial scene flow based on a deep, coarse alltoall mechanism, illustrated in Fig. 4.
Coarse Alltoall Correlation Matrix. As the first step of our global correlation unit, we use , to calculate a coarse alltoall correlation matrix . Inspired by FLOT [31]
, we calculate cosine similarity between the features vectors:
(1) 
and then use an exponential function to derive from it a soft correlation matrix:
(2) 
Thus, every entry , describes the correlation between and , and the softmax temperature is a hyperparameter, set to .
Unlike [31], we calculate the alltoall correlation matrix at a lower dimension, , thereby significantly reduce the required memory.
Global Flow Estimation. In order to use the calculated correlation matrix for global flow embedding in the euclidean space, we apply a simple matrix multiplication:
(3) 
Thus, is the average distance between to all points , weighted by their correlation magnitudes, where and are the coarse versions of and coordinates after the the encoder’s downsampling. The first iteration flow , is regressed out of by a set of set_up_conv layers [33].
4.3 Local Update Unit
We use an iterative update procedure, starting at the global flow estimation , and estimating the rest of the flow sequence based on local information.
Warp and Encode. At each iteration , we use the estimated flow from the previous iteration for warping the points of the source, i.e . Next, using the local encoder , we extract a new local features descriptor for the warped source , which we will later use for local correlation calculation (Fig. 3 top).
Local Correlation. To derive the correlation between the local features of the warped source and the target, we adopt the flow_embedding correlation layer proposed by FlowNet3D [21]. The proposed correlation layer aggregates feature similarity and spatial relationships of points within a local neighborhood, and therefore suitable for local refinements. Specifically, at each iteration , we calculate , which encodes flow embedding for every point in the warped source toward the target .
Gated Recurrent Unit (GRU). Inspired by RAFT [36], we use a gated activation unit based on the design of a GRU cell [5] as our updating mechanism. Given previous iteration hidden state , together with current iteration information , it produces an updated hidden state :
(4) 
(5) 
(6) 
(7) 
where is the Hadamard product, is a concatenation and
is the sigmoid activation function.
We define to be the concatenation of the warped source’s local features, local flow embedding, previous iteration flow, and previous iteration flow’s features. The previous iteration flow’s features are obtained by passing it through two set_conv layers called flow_enc: , as shown in Fig. 5.
For the initialization of the first iteration’s hidden state, we pass the local features of the source point cloud , through two set_conv layers.
Scene flow prediction. Given the new hidden state produced by the GRU cell, we use a flow regressor consisting of two set_conv layers to estimate the flow refinement . The updated flow is then calculated as .
Regression of flow refinements at totally different scales by the same CNN component is challenging. Hence, to encourage our system to learn coarse displacements in first iterations, we multiply the magnitude of each predicted scene flow by a factor , where C is a hyperparameter.
5 Training Loss Functions
To train our iterative system, we unroll
iterations and apply a loss function for each iteration prediction
:(8) 
Each iteration loss in the sequence can be chosen to be a selfsupervised (Sec. 5.1) or a fullysupervised (Sec. 5.2) loss.
Dataset  Method  Sup.  EPE3D  Acc3DS  AccDR  Outliers3D 

FlyingThings3D  ICP[2]  Self  0.4062  0.1614  0.3038  0.8796 
Egomotion[37]  Self  0.1696  0.2532  0.5501  0.8046  
PointPWCNet[43]  Self  0.1246  0.3068  0.6552  0.7032  
Ours  Self  0.0876  0.5064  0.8101  0.4507  
FlowNet3D[21]  Full  0.1136  0.4125  0.7706  0.6016  
HPLFlowNet[12]  Full  0.0804  0.6144  0.8555  0.4287  
PointPWCNet[43]  Full  0.0588  0.7379  0.9276  0.3424  
FLOT[31]  Full  0.0520  0.7320  0.9270  0.3570  
Ours  Full  0.0455  0.8162  0.9614  0.2165  
KITTI  ICP[2]  Self  0.5181  0.0669  0.1667  0.8712 
Egomotion[37]  Self  0.4154  0.2209  0.3721  0.8096  
PointPWCNet[43]  Self  0.2549  0.2379  0.4957  0.6863  
Ours  Self  0.1212  0.6553  0.7942  0.2938  
FlowNet3D[21]  Full  0.1767  0.3738  0.6677  0.5271  
HPLFlowNet[12]  Full  0.1169  0.4783  0.7776  0.4103  
PointPWCNet[43]  Full  0.0694  0.7281  0.8884  0.2648  
FLOT[31]  Full  0.0560  0.7550  0.9080  0.2420  
Ours  Full  0.0546  0.8051  0.9254  0.1492 
5.1 Selfsupervised Loss
Due to the lack of labeled data for 3D scene flow, we designed our solution to be trained in a selfsupervised manner, i.e. without the need of groundtruth flow.
Chamfer Loss. We follow previous works [11, 40, 43] and choose the Chamfer distance, which enforces the source to move toward the target according to mutual closest points, as our selfsupervised data loss:
(9) 
where is the warped source according to the predicted flow at iteration k.
Regularization Loss. Since Chamfer distance has multiple local minima, it is crucial to regularize it in order to reach sufficient convergence. Another reason for which our system requires strong regularization is that we warp the source according to the predicted flow before encoding it again. Hence, we need to carefully preserve the objects’ structures so that encoding the warped scene will produce meaningful local geometric features (Fig.8).
Motivated by [1, 11, 34, 43], we propose a strong Laplacian regularization, i.e we enforce the source to preserve its Laplacian when warped according to the predicted flow:
(10) 
We approximate the Laplacian at a point as its distances from all points , where is a set of points, of size , in a neighborhood around defined in a sequel. We use norm for regularization, so that our regularization loss is:
(11) 
where is the value of the predicted scene flow at point , and is the number of points in the source.
To reduce the computational overhead of nearest neighbors for a large , we use , where is the nearest neighbours of , and is calculated by random sampling points in an Euclidean ball around , with radius .
The overall selfsupervised loss is a weighted sum of Chamfer and regularization losses, over all sequence iterations:
(12) 
5.2 Fullysupervised Loss
In order to show our architecture efficiency, we further train our system in a fullysupervised manner, using the loss:
(13) 
where is the value of the groundtruth scene flow at point .
Unlike previous methods, we add a laplacian regularization loss to our fullysupervised training to encourage our system to preserve objects’ structures and approach toward the target in iterations. The regularization loss is the same as in the selfsupervised case, Eq. (11).
The overall fullysupervised loss is a weighted sum of and regularization losses, over all sequence iterations:
(14) 
6 Experiments
Following the experimental setup suggested in [12, 21, 31, 37, 43], we first train and evaluate our model on synthetic dataset FlyingThings3D [24] (Sec. 6.1) using both selfsupervised and fullysupervised approaches. Then, we test the models’ performance on the realworld KITTI scene flow dataset [26, 27] without any finetuning (Sec. 6.2). Finally, in Seq. 6.3, we conduct ablation studies regarding the inference iterations number, and the importance of the regularization loss.
Evaluation Metrics.
We use the same scene flow evaluation metrics proposed by
[21] and adopted by [12, 31, 43]:
EPE3D(m) average endpointerror over each point.

Acc3DS(0.05) percentage of points whose EPE3D or relative error

Acc3DR(0.1) percentage of points whose EPE3D or relative error

Outliers3D percentage of points whose EPE3D or relative error
6.1 Evaluation on FlyingThings3D
Due to the difficulty of acquiring dense scene flow data, we follow previous methods [12, 21, 31, 43] and train our system only on the synthetic FlyingThings3D dataset, using the same preprocessing methodology as [12].
First, we focus on a selfsupervised approach, which does not require any labeled data. Then, to demonstrate our system efficiency, we also conduct experiments using a fullysupervised loss.
Implementation Details.
The FlyingThings3D dataset contains 19,640 pairs of point clouds in the training set and 3,824 pairs in the validation set. We first train our system on one quarter of the train data (4910 pairs) and then finetune on the full training set, To speedup training. We used FlyingThings3D validation set for all our hyperparameters tuning.
We use points for each point cloud, batch size of , and unroll
iterations at all training procedures, using 8 GTX2080Ti GPUs. Pretraining is done for 90 epochs, with a learning rate of
and reduced by half at epochs . Selfsupervised model is finetuned for 21 epochs, with a learning rate of and reduced by half at epochs . Fullysupervised model is finetuned for 35 epochs, with a learning rate of and reduced by half at epochs .To reduce outliers, we limit the distance of correspondence points to a reasonable displacement range, by zeroing our coarse alltoall correlation matrix at every entry of which .
Lastly, we used FlyingThings3D validation set to determine the best number of iterations for our model at inference time. As discussed in Sec. 6.3, we set for our selfsupervised model and for the fullysupervised model for all tests.
All loss weights , of all training procedures, and a detailed scheme of our architecture can be found in the supplementary materials.
Results. We compare our selfsupervised method’s results with IterativeClosestPoint (ICP) [2], PointPWCNet [43], and the recent selfsupervised method introduced by Tishchenko [37], and our fullysupervised method’s results with FlowNet3D [21], HPLFlowNet [12], PointPWCNet [43], and FLOT [31].
As shown in Table 1, our method outperforms all existing methods on all evaluation metrics on the FlyingThings3D dataset, for both selfsupervised and fullysupervised frameworks. Moreover, our selfsupervised method is the only selfsupervised method with EPE3D below on the FlyingThings3D dataset.
6.2 Generalization on KITTI
To examine the generalization ability of our method to realworld data, we evaluate a model trained using FlyingThings3D, on realscans KITTI Scene Flow 2015 [26, 27] dataset, without any finetuning. Following [12, 43], we evaluate our model on all 142 scenes with available 3D data in the training set, and remove the ground points from the point clouds by height ().
Our selfsupervised method demonstrates a great generalization ability, outperforming all existing selfsupervised methods by more than .
Our fullysupervised model achieves EPE3D on par with state of the art method [31], highest accuracy, lowest outliers and benefits from memory efficiency (Fig. 2).
Regularization  EPE3D  Outliers3D 

Underregularization  0.3183  0.7698 
Overregularization  0.2706  0.8941 
Chosenregularization  0.1443  0.3736 
6.3 Ablation Studies
In this section, we examine the performance of our system with different number of inference iterations, and the influence of the regularization loss.
Number of iterations. Although we unrolled four iterations for training, we tested inference with larger K values. Fig. 7 shows the EPE3D of inference with different iterations number. Interestingly, both our models keep slightly improving for a few more iterations on the KITTI test set. On the FlyingThings3D validation set, our fullysupervised model keeps improving also in the fifth iteration whereas our selfsupervised model shows best results for K=4, as in training. Fig. 6 shows a qualitative example of our selfsupervised method during four inference iterations.
Regularization. Since our method reencodes the warped source at every iteration, it is crucial to train it using a regularization loss. While training with underregularization may distort the objects’ structure, overregularization may lead to semirigid motion predictions, which results in imperfect alignment. To demonstrate the importance of wisely choosing the regularization loss weights, we pretrain our selfsupervised model in three different regularization schemes, changing only the loss weights , , and then evaluate each one of them on the KITTI test set. We show the quantitative results in Table 2, and a qualitative example in Fig. 8.
7 Conclusions
In this work, we proposed and studied a novel approach for scene flow estimation by unrolling an iterative scheme using a recurrent architecture that learns the optimal steps toward the solution, called FlowStep3D. We showed the benefit of approaching the solution in a few steps by enforcing strong regularization and reencoding the warped scene, which is contrary to all previous learningbased solutions. Experiments performed on synthetic and real LiDAR scans data showed great generalization capability, especially for selfsupervised training, improving previous methods by a large margin.
References

[1]
B. Amberg, S. Romdhani, and T. Vetter.
Optimal step nonrigid icp algorithms for surface registration.
In
2007 IEEE Conference on Computer Vision and Pattern Recognition
, pages 1–8, 2007.  [2] P. Besl and N. D. McKay. A method for registration of 3d shapes. IEEE Trans. Pattern Anal. Mach. Intell., 14:239–256, 1992.
 [3] P. J. Besl and N. D. McKay. A method for registration of 3d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2):239–256, 1992.
 [4] Thomas Brox and Jitendra Malik. Large displacement optical flow: descriptor matching in variational motion estimation. IEEE transactions on pattern analysis and machine intelligence, 33(3):500–513, 2010.
 [5] Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoderdecoder approaches. arXiv preprint arXiv:1409.1259, 2014.
 [6] Ayush Dewan, Tim Caselitz, Gian Diego Tipaldi, and Wolfram Burgard. Rigid scene flow for 3d lidar scans. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1765–1770. IEEE, 2016.
 [7] Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 2758–2766, 2015.
 [8] Nariman Farsad, Nir Shlezinger, Andrea J Goldsmith, and Yonina C Eldar. Datadriven symbol detection via modelbased machine learning. arXiv preprint arXiv:2002.07806, 2020.
 [9] J. Feldmar and N. Ayache. Rigid, affine and locally affine registration of freeform surfaces. International Journal of Computer Vision, 18:99–119, 2004.
 [10] V. Golyanik, K. Kim, R. Maier, M. Niessner, D. Stricker, and J. Kautz. Multiframe scene flow with piecewise rigid motion. In International Conference on 3D Vision (3DV), Qingdao, China, October 2017.
 [11] Thibault Groueix, M. Fisher, V. Kim, Bryan C. Russell, and Mathieu Aubry. 3dcoded: 3d correspondences by deep deformation. In ECCV, 2018.
 [12] Xiuye Gu, Y. Wang, Chongruo Wu, Y. Lee, and Panqu Wang. Hplflownet: Hierarchical permutohedral lattice flownet for scene flow estimation on largescale point clouds. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3249–3258, 2019.
 [13] S. Hadfield and R. Bowden. Kinecting the dots: Particle based scene flow from depth sensors. In 2011 International Conference on Computer Vision, pages 2290–2295, 2011.
 [14] Berthold KP Horn and Brian G Schunck. Determining optical flow. In Techniques and Applications of Image Understanding, volume 281, pages 319–331. International Society for Optics and Photonics, 1981.
 [15] QiXing Huang, Bart Adams, Martin Wicke, and Leonidas J Guibas. Nonrigid registration under isometric deformations. In Proceedings of the Symposium on Geometry Processing, pages 1449–1457, 2008.
 [16] F. Huguet and F. Devernay. A variational method for scene flow estimation from stereo sequences. In 2007 IEEE 11th International Conference on Computer Vision, pages 1–7, 2007.
 [17] V. Jampani, Martin Kiefel, and P. Gehler. Learning sparse high dimensional filters: Image filtering, dense crfs and bilateral neural networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4452–4461, 2016.
 [18] Martin Kiefel, Varun Jampani, and Peter V Gehler. Permutohedral lattice cnns. arXiv preprint arXiv:1412.6618, 2014.
 [19] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989.
 [20] Y. Li, M. Tofighi, V. Monga, and Y. C. Eldar. An algorithm unrolling approach to deep image deblurring. In ICASSP 2019  2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7675–7679, 2019.
 [21] Xingyu Liu, Charles R Qi, and Leonidas J Guibas. Flownet3d: Learning scene flow in 3d point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 529–537, 2019.
 [22] B. D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In IJCAI, 1981.
 [23] Jiayi Ma, Ji Zhao, and Alan L Yuille. Nonrigid point set registration by preserving global and local structures. IEEE Transactions on image Processing, 25(1):53–64, 2015.
 [24] Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
 [25] Tim Meinhardt, Michael Moller, Caner Hazirbas, and Daniel Cremers. Learning proximal operators: Using denoising networks for regularizing inverse imaging problems. In Proceedings of the IEEE International Conference on Computer Vision, pages 1781–1790, 2017.
 [26] Moritz Menze, Christian Heipke, and Andreas Geiger. Joint 3d estimation of vehicles and scene flow. In Proc. of the ISPRS Workshop on Image Sequence Analysis (ISA), 2015.
 [27] Moritz Menze, Christian Heipke, and Andreas Geiger. Object scene flow. ISPRS Journal of Photogrammetry and Remote Sensing, 2018.
 [28] Chris Metzler, Ali Mousavi, and Richard Baraniuk. Learned damp: Principled neural network based compressive image recovery. In Advances in Neural Information Processing Systems, pages 1772–1783, 2017.
 [29] Himangi Mittal, Brian Okorn, and David Held. Just go with the flow: Selfsupervised scene flow estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11174–11182, 2020.
 [30] Vishal Monga, Yuelong Li, and Yonina C Eldar. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. arXiv preprint arXiv:1912.10557, 2019.
 [31] Gilles Puy, Alexandre Boulch, and Renaud Marlet. Flot: Scene flow on point clouds guided by optimal transport. ArXiv, abs/2007.11142, 2020.
 [32] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
 [33] Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pages 5099–5108, 2017.
 [34] Olga Sorkine. Laplacian Mesh Processing. In Yiorgos Chrysanthou and Marcus Magnor, editors, Eurographics 2005  State of the Art Reports. The Eurographics Association, 2005.
 [35] Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, MingHsuan Yang, and Jan Kautz. Splatnet: Sparse lattice networks for point cloud processing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2530–2539, 2018.
 [36] Zachary Teed and Jun Deng. Raft: Recurrent allpairs field transforms for optical flow. ArXiv, abs/2003.12039, 2020.
 [37] Ivan Tishchenko, S. Lombardi, M. Oswald, and M. Pollefeys. Selfsupervised learning of nonrigid residual flow and egomotion. ArXiv, abs/2009.10467, 2020.
 [38] Sundar Vedula, Simon Baker, Peter Rander, Robert Collins, and Takeo Kanade. Threedimensional scene flow. In Proceedings of the Seventh IEEE International Conference on Computer Vision, volume 2, pages 722–729. IEEE, 1999.
 [39] Christoph Vogel, Konrad Schindler, and Stefan Roth. Piecewise rigid scene flow. In Proceedings of the IEEE International Conference on Computer Vision, pages 1377–1384, 2013.
 [40] Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and YuGang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European Conference on Computer Vision (ECCV), pages 52–67, 2018.
 [41] Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, and Cordelia Schmid. Deepflow: Large displacement optical flow with deep matching. In Proceedings of the IEEE international conference on computer vision, pages 1385–1392, 2013.
 [42] Wenxuan Wu, Zhongang Qi, and Li Fuxin. Pointconv: Deep convolutional networks on 3d point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9621–9630, 2019.
 [43] Wenxuan Wu, Zhiyuan Wang, Zhuwen Li, Wei Liu, and Li Fuxin. Pointpwcnet: A coarsetofine network for supervised and selfsupervised scene flow estimation on 3d point clouds. arXiv preprint arXiv:1911.12408, 2019.
 [44] Y. Yang, J. Sun, H. Li, and Z. Xu. Admmcsnet: A deep learning approach for image compressive sensing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(3):521–538, 2020.

[45]
K. Zhang, L. Gool, and R. Timofte.
Deep unfolding network for image superresolution.
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3214–3223, 2020.
Comments
There are no comments yet.