NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view
2021, NIPS, 135 citations.
Paper arxiv Link, Paper PDF link
Peng Wang$^1$, Lingjie Liu$^2$ $^∗$ , Yuan Liu$^1$, Christian Theobalt$^2$, Taku Komura$^1$, Wenping Wang$^3$ $^∗$
$^1$The University of Hong Kong $^2$Max Planck Institute for Informatics $^3$Texas A&M University
$^1${pwang3,yliu,taku}@cs.hku.hk $^2${lliu,theobalt}@mpi-inf.mpg.de $^3$wenping@tamu.edu
∗ Corresponding Authors.
Rebuttal Reading
Rebuttal,Peer Review 参见 link,有一些重要的点,在正文中没有提到。
- 为什么选择logistic density function?
Theoretically, the density function used in our framework can be any unimodal density function. We chose the logistic distribution because the CDF of logistic distribution can be analytically computed with a Sigmoid function, while a commonly-used Gaussian distribution has no analytical equation for CDF.
- 为什么有时候带有mask的模型,反而比没有mask的模型更差了?
R1-Q7. Why do the results for some scenes get worse when using the mask supervision?
Among the 15 scenes in DTU dataset, there are two cases (scan 40, 63) where the quantitative results with mask supervision are observably worse than those without mask supervision. As shown in Fig. 10 of the supplementary material, in row 1 (scan 40) and row 3 (scan 63) the results with mask supervision have more incorrect concave surfaces on textureless regions than those without mask supervision. We speculate that it is because imposing mask loss encourages the surface to shrink to coincide with the mask, which results in concave surfaces on textureless regions.
他说,> 我们猜测,这是由于强制加入mask损失函数,容易让模型的表面收缩到有掩膜的区域,这导致了在无纹理的区域中出现凹面。
Open Review Ratings
Time Spent Reviewing | Rating | Confidence |
8 | 7 | 4 |
4 | 8 | 4 |
4 | 8 | 5 |
2 | 8 | 4 |
如果不安装pytorch==1.7.1的版本,会导致出现RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasCreate(handle)
- 采用有向距离场,表示几何物体的表面。
- 设计无偏的权重函数。
- 所谓无偏,指的是在物体表面的发光程度应该最大。
- 设计能够注意到物体遮挡的权重函数。
- 如果光束穿过物体的表面,并且横穿多个物体表面,第一个物体的表面对最后的颜色渲染贡献最高,后面的表面对渲染颜色的贡献度应该更低。
- 对于不透明物体,甚至是发光物体(例如屏幕),前表面贡献了最多的发光量。被遮挡的后表面对物体渲染不产生贡献。
- 对于透明物体、薄物体,前表面贡献部分光量,后表面也应当投射过来一些产光量。(例如窗帘)。
1. 渲染时间过久。
All-Q3. Training & inferencing time. As described in Section 4.1, the training time of each scene is around 14 hours (w/ background modeled by NeRF++) or 16 hours (w/o background) for 300k iterations. At inference time, rendering an image at the resolution of 1600x1200 takes around 320 seconds(w/ background modeled by NeRF++) or 250 seconds(w/o background).
We also tested a new sampling strategy by first applying sphere tracing to find the regions near the surfaces and only sampling points in those regions. With this strategy, rendering an image at 1600x1200 pixels only needs 60 seconds(w/ background modeled by NeRF++) or 30 seconds(w/o background), which is comparable to that by IDR (30 seconds per image, w/o background). Another acceleration strategy is to incorporate the sparse voxel structures as done in NSVF and PlenOctree. We will add this discussion to the revision.
Cited from the authors reply to NeurIPS Program Chairs. Link Page
- 每次渲染的时候包含多个像素,或者逐个Patch进行渲染。
- 甚至在渲染的时候,可以试图加入Attention、Transformer的结构进去,提高渲染的效率。
- 渲染的过程中,需要用到MLP,而且MLP的层数也比较深。我在想,是不是MLP的加载速度都比较慢,因为中间神经元是全连接的状态。
- 思考,Mobile Transformer,Yolo v4/ Yolo v5都用了什么样的加速策略。
2. 渲染针对刚性物体。
Second, representing non-watertight manifolds and/or manifolds with boundaries, such as zero thickness surfaces, is not possible with an SDF.
Cited from the conclusion of “Volume Rendering of Neural Implicit Surfaces”,
4. 多个表面
5. 多个表面的密度函数
- [ x ] Marching Cubes
- [ ] IDR。文章的基本架构就来自于这里。
- Multiview neural surface reconstruction by disentangling geometry and appearance.
- Lior Yariv, Yoni Kasten, Dror Moran, Meirav Galun, Matan Atzmon, Basri Ronen, and Yaron Lipman. Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems, 33, 2020.
- [ ] Eikonal Loss。这会让学习到的SDF更加规整。
- Implicit geometric regularization for learning shapes.
- Amos Gropp, Lior Yariv, Niv Haim, Matan Atzmon, and Yaron Lipman. Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099, 2020
- [ x ] NeuMesh
- [ ] 点云到表面的泊松表面重建。Michael Kazhdan and Hugues Hoppe. Screened poisson surface reconstruction. ACM Trans. Graph., 32(3), July 2013.
- [ ] Deepsdf。此前已经有每隔四层走一个跳跃连接的做法。VolSDF似乎也采用每4层做跳跃连接的做法。
- Deepsdf: Learning continuous signed distance functions for shape representation.
- Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 165–174, 2019.
- [ ] COLMAP。一个经典的、基于多视图几何的方法。该方法恢复出来的是点云,还需要用泊松重建的方法,恢复物体的mesh表面。
- Pixelwise view selection for unstructured multi-view stereo. I
- Johannes L Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision, pages 501–518. Springer, 2016.
- [ ] BlendMVS. 挑战性的数据集,分辨率是768 x 576,图片的数目是31 − 143 。
- Blendedmvs: A large-scale dataset for generalized multi-view stereo networks.
- Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1790–1799, 2020.
- [ ] DVR. 另一篇2020年就在进行神经隐式表面重建的工作。
- Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision.
- Michael Niemeyer, Lars Mescheder, Michael Oechsle, and Andreas Geiger. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3504–3515, 2020.