标题
NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view
2021, NIPS, 135 citations.
链接
Paper arxiv Link, Paper PDF link
作者
Peng Wang$^1$, Lingjie Liu$^2$ $^∗$ , Yuan Liu$^1$, Christian Theobalt$^2$, Taku Komura$^1$, Wenping Wang$^3$ $^∗$
$^1$The University of Hong Kong $^2$Max Planck Institute for Informatics $^3$Texas A&M University
$^1${pwang3,yliu,taku}@cs.hku.hk $^2${lliu,theobalt}@mpi-inf.mpg.de $^3$wenping@tamu.edu
∗ Corresponding Authors.
华人为主的论文,香港大学为主。通讯作者是马克斯·普朗克信息学研究所的(没见过)。另一个通讯作者是德克萨斯农工大学的(也没见过)。
Rebuttal Reading
Rebuttal,Peer Review 参见 link,有一些重要的点,在正文中没有提到。
- 为什么选择logistic density function?
Theoretically, the density function used in our framework can be any unimodal density function. We chose the logistic distribution because the CDF of logistic distribution can be analytically computed with a Sigmoid function, while a commonly-used Gaussian distribution has no analytical equation for CDF.
原因是,他们的工作需要使用单峰概率密度函数。常见的单峰函数,高斯函数,没有显式的累计概率密度的表达式。
- 为什么有时候带有mask的模型,反而比没有mask的模型更差了?
R1-Q7. Why do the results for some scenes get worse when using the mask supervision?
Among the 15 scenes in DTU dataset, there are two cases (scan 40, 63) where the quantitative results with mask supervision are observably worse than those without mask supervision. As shown in Fig. 10 of the supplementary material, in row 1 (scan 40) and row 3 (scan 63) the results with mask supervision have more incorrect concave surfaces on textureless regions than those without mask supervision. We speculate that it is because imposing mask loss encourages the surface to shrink to coincide with the mask, which results in concave surfaces on textureless regions.
其实没有解释清楚为什么会出现带有mask的模型效果更差了。
他说,> 我们猜测,这是由于强制加入mask损失函数,容易让模型的表面收缩到有掩膜的区域,这导致了在无纹理的区域中出现凹面。
Open Review Ratings
Time Spent Reviewing | Rating | Confidence |
---|---|---|
8 | 7 | 4 |
4 | 8 | 4 |
4 | 8 | 5 |
2 | 8 | 4 |
代码环境搭建
需要安装pytorch==1.7.1
如果需要用到torchvision,需要手动指定版本编号为0.8.2,否则会默认升级torch的版本为1.13.1。
torchvision=0.8.2
如果不安装pytorch==1.7.1的版本,会导致出现RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasCreate(handle)
的问题。详情参见Blog。
想法
优点
- 采用有向距离场,表示几何物体的表面。
- 设计无偏的权重函数。
- 所谓无偏,指的是在物体表面的发光程度应该最大。
- 设计能够注意到物体遮挡的权重函数。
- 如果光束穿过物体的表面,并且横穿多个物体表面,第一个物体的表面对最后的颜色渲染贡献最高,后面的表面对渲染颜色的贡献度应该更低。
- 对于不透明物体,甚至是发光物体(例如屏幕),前表面贡献了最多的发光量。被遮挡的后表面对物体渲染不产生贡献。
- 对于透明物体、薄物体,前表面贡献部分光量,后表面也应当投射过来一些产光量。(例如窗帘)。
缺点
1. 渲染时间过久。
All-Q3. Training & inferencing time. As described in Section 4.1, the training time of each scene is around 14 hours (w/ background modeled by NeRF++) or 16 hours (w/o background) for 300k iterations. At inference time, rendering an image at the resolution of 1600x1200 takes around 320 seconds(w/ background modeled by NeRF++) or 250 seconds(w/o background).
We also tested a new sampling strategy by first applying sphere tracing to find the regions near the surfaces and only sampling points in those regions. With this strategy, rendering an image at 1600x1200 pixels only needs 60 seconds(w/ background modeled by NeRF++) or 30 seconds(w/o background), which is comparable to that by IDR (30 seconds per image, w/o background). Another acceleration strategy is to incorporate the sparse voxel structures as done in NSVF and PlenOctree. We will add this discussion to the revision.
Cited from the authors reply to NeurIPS Program Chairs. Link Page
我认为,渲染时间过久的问题,主要来自于查询的非并行性所导致的。
第一,主要的问题在于,渲染过程,每次都要逐个像素、逐个视点进行渲染,分辨率越高,渲染的时间越久。一个合理的解决方案,是设计一种批量渲染的策略。
- 每次渲染的时候包含多个像素,或者逐个Patch进行渲染。
- 甚至在渲染的时候,可以试图加入Attention、Transformer的结构进去,提高渲染的效率。
第二,渲染的时间受到MLP的影响。
- 渲染的过程中,需要用到MLP,而且MLP的层数也比较深。我在想,是不是MLP的加载速度都比较慢,因为中间神经元是全连接的状态。
- 思考,Mobile Transformer,Yolo v4/ Yolo v5都用了什么样的加速策略。
2. 渲染针对刚性物体。
例如窗帘、镜面、显示器、亚克力板,这些模型的渲染重建结果,想来未必尽如人意。
这一点在另一篇论文中得到了印证,并且这种想法被扩展了。
Second, representing non-watertight manifolds and/or manifolds with boundaries, such as zero thickness surfaces, is not possible with an SDF.
Cited from the conclusion of “Volume Rendering of Neural Implicit Surfaces”,
3.一阶近似
这个假设是在物体的表面一段距离内进行的,而且是采用的一阶近似。这种近似是否有范围限制,并且近似的程度有多高,近似的效果有多好,都还要考虑一下。
4. 多个表面
虽然是对多个表面的射线穿透进行假设,而且能够证明,穿过多个表面时候的近似证明也不错。但是对不对呢?还需要再思考。
5. 多个表面的密度函数
密度函数,由于训练的过程中,1/s会缩减为接近零的情况,接近物体表面时候,密度是最大的,方差是最小的。第一个表面、第二个表面,如何影响这种方差呢?还要再考虑。
后续要读的文章
- [ x ] Marching Cubes
- [ ] IDR。文章的基本架构就来自于这里。
- Multiview neural surface reconstruction by disentangling geometry and appearance.
- Lior Yariv, Yoni Kasten, Dror Moran, Meirav Galun, Matan Atzmon, Basri Ronen, and Yaron Lipman. Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems, 33, 2020.
- [ ] UNISURF
- [ ] Eikonal Loss。这会让学习到的SDF更加规整。
- Implicit geometric regularization for learning shapes.
- Amos Gropp, Lior Yariv, Niv Haim, Matan Atzmon, and Yaron Lipman. Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099, 2020
- [ x ] NeuMesh
- [ ] 点云到表面的泊松表面重建。Michael Kazhdan and Hugues Hoppe. Screened poisson surface reconstruction. ACM Trans. Graph., 32(3), July 2013.
- [ ] Deepsdf。此前已经有每隔四层走一个跳跃连接的做法。VolSDF似乎也采用每4层做跳跃连接的做法。
- Deepsdf: Learning continuous signed distance functions for shape representation.
- Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 165–174, 2019.
- [ ] COLMAP。一个经典的、基于多视图几何的方法。该方法恢复出来的是点云,还需要用泊松重建的方法,恢复物体的mesh表面。
- Pixelwise view selection for unstructured multi-view stereo. I
- Johannes L Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision, pages 501–518. Springer, 2016.
- [ ] BlendMVS. 挑战性的数据集,分辨率是768 x 576,图片的数目是31 − 143 。
- Blendedmvs: A large-scale dataset for generalized multi-view stereo networks.
- Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1790–1799, 2020.
- [ ] DVR. 另一篇2020年就在进行神经隐式表面重建的工作。
- Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision.
- Michael Niemeyer, Lars Mescheder, Michael Oechsle, and Andreas Geiger. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3504–3515, 2020.