iNVS : Repurposing Diffusion Inpainters for Novel View Synthesis
Accepted to SIGGRAPH Asia, 2023





  • We present a method for generating consistent novel views from a single source image, which focuses on maximizing the reuse of visible pixels from the source image.
  • We use a monocular depth estimator that transfers visible pixels from the source view to the target view, and then train a diffusion inpainter to fill in the missing pixels on Objaverse dataset.

Novel view synthesis results for unseen objects. Our system synthesizes novel view from a single image for unseen objects. We obtain detailed generations, while respecting the appearance of the region that is visible in the input image by maximizing reuse of source pixels.

Method: iNVS

Depth-based Splatting to create Partial Views (Left). We use ZoeDepth to unproject the source view into 3D, and apply depth-based Softmax Splatting to create a partial target view.


Training Inpainter to complete Partial Views (Right). While training on Objaverse use a inpainting masking bassd on epipolar lines, which allows our model to discover object boundaries better.

Qualitative Result: Baseline Comparisons

Compared to baselines, our method preserves sharp details (text and texture) much better.

Qualitative Result: Ablation Study

We ablate our method on various design choices made to demonstrate their importance.

Qualitative Result: Multiple Novel Views

We find that iNVS can generate consistent views across range of viewpoints given monocular depth estimator is accurate.

Qualitative Result: Failure Modes

We find that iNVS struggles most when monocular depth estimator generates inaccurate depth.

