iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis

iNVS : Repurposing Diffusion Inpainters for Novel View Synthesis
Accepted to SIGGRAPH Asia, 2023

Yash Kant
University of Toronto
Aliaksandr Siarohin
Snap Research
Michael Vasilkovsky
Snap Research
Riza Alp Guler
Snap Research

Jian Ren
Snap Research
Sergey Tulyakov
Snap Research
Igor Gilitschenski
University of Toronto

tl;dr

Citation

@inproceedings{invs2023,
    title={iNVS : Repurposing Diffusion Inpainters for Novel View Synthesis},
    author={Yash Kant and Aliaksandr Siarohin and Michael Vasilkovsky and Riza Alp Guler and Jian Ren and Sergey Tulyakov and Igor Gilitschenski},
    booktitle = {SIGGRAPH Asia 2023 Conference Papers},
    year={2023}
    }

Overview

We present a method for generating consistent novel views from a single source image, which focuses on maximizing the reuse of visible pixels from the source image.
We use a monocular depth estimator that transfers visible pixels from the source view to the target view, and then train a diffusion inpainter to fill in the missing pixels on Objaverse dataset.

Novel view synthesis results for unseen objects. Our system synthesizes novel view from a single image for unseen objects. We obtain detailed generations, while respecting the appearance of the region that is visible in the input image by maximizing reuse of source pixels.

Method: iNVS

Depth-based Splatting to create Partial Views (Left). We use ZoeDepth to unproject the source view into 3D, and apply depth-based Softmax Splatting to create a partial target view.

Training Inpainter to complete Partial Views (Right). While training on Objaverse use a inpainting masking bassd on epipolar lines, which allows our model to discover object boundaries better.

Qualitative Result: Baseline Comparisons

Compared to baselines, our method preserves sharp details (text and texture) much better.

Qualitative Result: Ablation Study

We ablate our method on various design choices made to demonstrate their importance.

Qualitative Result: Multiple Novel Views

We find that iNVS can generate consistent views across range of viewpoints given monocular depth estimator is accurate.

Qualitative Result: Failure Modes

We find that iNVS struggles most when monocular depth estimator generates inaccurate depth.

The website template was borrowed from Michaël Gharbi.