DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes¶

约 584 个字预计阅读时间 2 分钟

Abstract¶

Task

Free-viewpoint photorealistic view synthesis

Technical Challenges For Previous Methods

Slow convergence (NeRF)

Key Insight / Motivation

static → dynamic learning paradigm

Efficient learning of deformable radiance fields

Technical Contributions

Model both the 3D canonical space and 4D-deformation field of a dynamic, non-rigid scene with explicit and discrete voxel based representations.

static → dynamic learning paradigm

Experiment

Introduction¶

Task and Application

Free-viewpoint photorealistic view synthesis techniques from a set of captured images unleash new opportunities for immersive applications such as virtual reality, telepresence, and 3D animation production.

Technical Challenges For Previous Methods

Multi-plane images & NeRF:

mainly focus on static scenes

Volume-Deform (unified volumetric representation to encode both the scene’s geometry and its motion),

Neural Volumes (represents dynamic objects with a 3D voxel grid plus an implicit warp field),

D-NeRF (learns a deformation field that maps coordinates in a dynamic field to a NeRF-based canonical space),

HyperNeRF ( model the motion in a higher dimension space, representing the time-dependent radiance field by slicing through the hyperspace):

Require days of GPU training time

DVGO (explicit and discretized volume representations),

Plenoxels (employs sparse voxel grids as the scene representation and uses spherical harmonics to model view-dependent appearance),

Instant-ngp (multiresolution hash encoding):

Mainly focus on static scenes

Our Pipeline

In the first stage, DeVRF learns a 3D volumetric canonical prior (b) from multi-view static images.

In the second stage, a 4D deformation field (d) is jointly optimized from taking few-view dynamic sequences © and the 3D canonical prior

Demos & Application

Method¶

Overview

Specific task. Input, output. First stage, second stage

3D Volumetric Canonical Space.

Motivation

We take inspiration from the volumetric representation of DVGO.

Method

\(Tri-Interp([x,y,z], Vp) : \R^3, \R^{C× N_x× N_y× N_z} → \R^C ,∀p ∈ {density, color}\)

where C is the dimension of scene property. Property learned are density and color.

We employ softplus and post-activation in \(V_{density}\)

We also apply a shallow MLP in \(V_{color}\) to enable view-dependent color effects

Advantage

Efficiently query the scene property of any 3D point

4D Voxel Deformation Field

Motivation

Method

The 3D motion \(∆X_{t→0} = {∆X^{t→0}_i}\) (i means neighbours)can be efficiently queried through quadruple interpolation of their neighboring voxels at neighboring time steps in the 4D backward deformation field.

\(Quad-Interp([x,y,z,t], V_{motion}) : \R^4, \R^{N_t× C× N_x× N_y× N_z} → \R^C\)

\(C\) is the degrees of freedom (DoFs)

\(N_t\) is the number of key time steps

Backward here because we needs to find the original position in the static scene.

Advantage

Scene properties of \(X_t\) can then be obtained by querying the scene properties of their corresponding canonical points \(X_0\) through trilinear interpolation.

Efficient.

Coarse-to-Fine Optimization

4D Deformation Cycle Consistency

Optical Flow Supervision

we first compute the corresponding 3D points of \(X_0\) at \(t−1\) time step via forward motion got \(\tilde X_{t−1}\). After that, we project \(\tilde X_{t−1}\)onto the reference camera and get their pixel locations \(\tilde{P_{t−1}}\), and compute the induced optical flow with respect to the pixel location \(P_t\) from which the rays of \(X_t\) are cast.

Then we compare induced optical flow with gt.

Experiments¶

Comparison Experiments
Ablation Study

Limitations¶

The model size is large due to its large number of parameters.

DeVRF currently does not synchronously optimize the 3D canonical space prior during the second stage, and thus may not be able to model drastic deformations.