iNVS : Repurposing Diffusion Inpainters for Novel View Synthesis
Accepted to SIGGRAPH Asia, 2023
-
Yash Kant
University of Toronto -
Aliaksandr Siarohin
Snap Research -
Michael Vasilkovsky
Snap Research -
Riza Alp Guler
Snap Research -
Jian Ren
Snap Research -
Sergey Tulyakov
Snap Research -
Igor Gilitschenski
University of Toronto
tl;dr
Citation
Overview
- We present a method for generating consistent novel views from a single source image, which focuses on maximizing the reuse of visible pixels from the source image.
- We use a monocular depth estimator that transfers visible pixels from the source view to the target view, and then train a diffusion inpainter to fill in the missing pixels on Objaverse dataset.
Method: iNVS
Depth-based Splatting to create Partial Views (Left). We use ZoeDepth to unproject the source view into 3D, and apply depth-based Softmax Splatting to create a partial target view.
Training Inpainter to complete Partial Views (Right). While training on Objaverse use a inpainting masking bassd on epipolar lines, which allows our model to discover object boundaries better.
Qualitative Result: Baseline Comparisons
Compared to baselines, our method preserves sharp details (text and texture) much better.
Qualitative Result: Ablation Study
We ablate our method on various design choices made to demonstrate their importance.
Qualitative Result: Multiple Novel Views
We find that iNVS can generate consistent views across range of viewpoints given monocular depth estimator is accurate.
Qualitative Result: Failure Modes
We find that iNVS struggles most when monocular depth estimator generates inaccurate depth.
The website template was borrowed from Michaƫl Gharbi.