The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement

11/26/2021

∙

Modern smartphones can continuously stream multi-megapixel RGB images at 60 Hz, synchronized with high-quality 3D pose information and low-resolution LiDAR-driven depth estimates. During a snapshot photograph, the natural unsteadiness of the photographer's hands offers millimeter-scale variation in camera pose, which we can capture along with RGB and depth in a circular buffer. In this work we explore how, from a bundle of these measurements acquired during viewfinding, we can combine dense micro-baseline parallax cues with kilopixel LiDAR depth to distill a high-fidelity depth map. We take a test-time optimization approach and train a coordinate MLP to output photometrically and geometrically consistent depth estimates at the continuous coordinates along the path traced by the photographer's natural hand shake. The proposed method brings high-resolution depth estimates to 'point-and-shoot' tabletop photography and requires no additional hardware, artificial hand motion, or user interaction beyond the press of a button.

READ FULL TEXT

The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement

Sign in with Google

Consider DeepAI Pro