forked from MrNeRF/awesome-3D-gaussian-splatting
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathawesome_3dgs_papers.yaml
11338 lines (11305 loc) · 660 KB
/
awesome_3dgs_papers.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
- id: huang2025enerverse
title: 'EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation'
authors: Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Zhengkai Jiang,
Yue Hu, Peng Gao, Hongsheng Li, Maoqing Yao, Guanghui Ren
year: '2025'
abstract: 'We introduce EnerVerse, a comprehensive framework for embodied future
space generation specifically designed for robotic manipulation tasks. EnerVerse
seamlessly integrates convolutional and bidirectional attention mechanisms for
inner-chunk space modeling, ensuring low-level consistency and continuity. Recognizing
the inherent redundancy in video data, we propose a sparse memory context combined
with a chunkwise unidirectional generative paradigm to enable the generation of
infinitely long sequences. To further augment robotic capabilities, we introduce
the Free Anchor View (FAV) space, which provides flexible perspectives to enhance
observation and analysis. The FAV space mitigates motion modeling ambiguity, removes
physical constraints in confined environments, and significantly improves the
robot''s generalization and adaptability across various tasks and settings. To
address the prohibitive costs and labor intensity of acquiring multi-camera observations,
we present a data engine pipeline that integrates a generative model with 4D Gaussian
Splatting (4DGS). This pipeline leverages the generative model''s robust generalization
capabilities and the spatial constraints provided by 4DGS, enabling an iterative
enhancement of data quality and diversity, thus creating a data flywheel effect
that effectively narrows the sim-to-real gap. Finally, our experiments demonstrate
that the embodied future space generation prior substantially enhances policy
predictive capabilities, resulting in improved overall performance, particularly
in long-range robotic manipulation tasks.
'
project_page: https://sites.google.com/view/enerverse
paper: https://arxiv.org/pdf/2501.01895.pdf
code: null
video: null
tags:
- Dynamic
- Project
- Robotics
thumbnail: assets/thumbnails/huang2025enerverse.jpg
publication_date: '2025-01-03T17:00:33+00:00'
- id: longhini2024clothsplatting
title: 'Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision'
authors: Alberta Longhini, Marcel Büsching, Bardienus Pieter Duisterhof, Jens Lundell,
Jeffrey Ichnowski, Mårten Björkman, Danica Kragic
year: '2024'
abstract: Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field
reconstruction, manifesting efficient and high-fidelity novel view synthesis.
However, accurately We introduce Cloth-Splatting, a method for estimating 3D states
of cloth from RGB images through a prediction-update framework. Cloth-Splatting
leverages an action-conditioned dynamics model for predicting future states and
uses 3D Gaussian Splatting to update the predicted states. Our key insight is
that coupling a 3D mesh-based representation with Gaussian Splatting allows us
to define a differentiable map between the cloth's state space and the image space.
This enables the use of gradient-based optimization techniques to refine inaccurate
state estimates using only RGB supervision. Our experiments demonstrate that Cloth-Splatting
not only improves state estimation accuracy over current baselines but also reduces
convergence time by ~85%.
project_page: https://kth-rpl.github.io/cloth-splatting/
paper: https://arxiv.org/pdf/2501.01715.pdf
code: https://github.com/KTH-RPL/cloth-splatting
video: null
tags:
- Code
- Meshing
- Project
- Rendering
thumbnail: assets/thumbnails/longhini2024clothsplatting.jpg
publication_date: '2025-01-03T09:17:30+00:00'
date_source: arxiv
- id: zhang2025crossviewgs
title: 'CrossView-GS: Cross-view Gaussian Splatting For Large-scale Scene Reconstruction'
authors: Chenhao Zhang, Yuanping Cao, Lei Zhang
year: '2025'
abstract: 3D Gaussian Splatting (3DGS) has emerged as a prominent method for scene
representation and reconstruction, leveraging densely distributed Gaussian primitives
to enable real-time rendering of high-resolution images. While existing 3DGS methods
perform well in scenes with minor view variation, large view changes in cross-view
scenes pose optimization challenges for these methods. To address these issues,
we propose a novel cross-view Gaussian Splatting method for large-scale scene
reconstruction, based on dual-branch fusion. Our method independently reconstructs
models from aerial and ground views as two independent branches to establish the
baselines of Gaussian distribution, providing reliable priors for cross-view reconstruction
during both initialization and densification. Specifically, a gradient-aware regularization
strategy is introduced to mitigate smoothing issues caused by significant view
disparities. Additionally, a unique Gaussian supplementation strategy is utilized
to incorporate complementary information of dual-branch into the cross-view model.
Extensive experiments on benchmark datasets demonstrate that our method achieves
superior performance in novel view synthesis compared to state-of-the-art methods.
project_page: null
paper: https://arxiv.org/pdf/2501.01695.pdf
code: null
video: null
tags:
- Large-Scale
- Optimization
thumbnail: assets/thumbnails/zhang2025crossviewgs.jpg
publication_date: '2025-01-03T08:24:59+00:00'
- id: wang2025pgsag
title: 'PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings
Reconstruction via Semantic-Aware Grouping'
authors: Tengfei Wang, Xin Wang, Yongmao Hou, Yiwei Xu, Wendi Zhang, Zongqian Zhan
year: '2025'
abstract: 3D Gaussian Splatting (3DGS) has emerged as a transformative method in
the field of real-time novel synthesis. Based on 3DGS, recent advancements cope
with large-scale scenes via spatial-based partition strategy to reduce video memory
and optimization time costs. In this work, we introduce a parallel Gaussian splatting
method, termed PG-SAG, which fully exploits semantic cues for both partitioning
and Gaussian kernel optimization, enabling fine-grained building surface reconstruction
of large-scale urban areas without downsampling the original image resolution.
First, the Cross-modal model - Language Segment Anything is leveraged to segment
building masks. Then, the segmented building regions is grouped into sub-regions
according to the visibility check across registered images. The Gaussian kernels
for these sub-regions are optimized in parallel with masked pixels. In addition,
the normal loss is re-formulated for the detected edges of masks to alleviate
the ambiguities in normal vectors on edges. Finally, to improve the optimization
of 3D Gaussians, we introduce a gradient-constrained balance-load loss that accounts
for the complexity of the corresponding scenes, effectively minimizing the thread
waiting time in the pixel-parallel rendering stage as well as the reconstruction
lost. Extensive experiments are tested on various urban datasets, the results
demonstrated the superior performance of our PG-SAG on building surface reconstruction,
compared to several state-of-the-art 3DGS-based methods.
project_page: null
paper: https://arxiv.org/pdf/2501.01677.pdf
code: null
video: null
tags:
- Large-Scale
- Meshing
- Optimization
thumbnail: assets/thumbnails/wang2025pgsag.jpg
publication_date: '2025-01-03T07:40:16+00:00'
- id: gao2025easysplat
title: 'EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy'
authors: Ao Gao, Luosong Guo, Tao Chen, Zhao Wang, Ying Tai, Jian Yang, Zhenyu Zhang
year: '2025'
abstract: 3D Gaussian Splatting (3DGS) techniques have achieved satisfactory 3D
scene representation. Despite their impressive performance, they confront challenges
due to the limitation of structure-from-motion (SfM) methods on acquiring accurate
scene initialization, or the inefficiency of densification strategy. In this paper,
we introduce a novel framework EasySplat to achieve high-quality 3DGS modeling.
Instead of using SfM for scene initialization, we employ a novel method to release
the power of large-scale pointmap approaches. Specifically, we propose an efficient
grouping strategy based on view similarity, and use robust pointmap priors to
obtain high-quality point clouds and camera poses for 3D scene initialization.
After obtaining a reliable scene structure, we propose a novel densification approach
that adaptively splits Gaussian primitives based on the average shape of neighboring
Gaussian ellipsoids, utilizing KNN scheme. In this way, the proposed method tackles
the limitation on initialization and optimization, leading to an efficient and
accurate 3DGS modeling. Extensive experiments demonstrate that EasySplat outperforms
the current state-of-the-art (SOTA) in handling novel view synthesis.
project_page: null
paper: https://arxiv.org/pdf/2501.01003.pdf
code: null
video: null
tags:
- Acceleration
- Densification
- Rendering
thumbnail: assets/thumbnails/gao2025easysplat.jpg
publication_date: '2025-01-02T01:56:58+00:00'
- id: yang2024storm
title: 'STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes'
authors: Jiawei Yang, Jiahui Huang, Yuxiao Chen, Yan Wang, Boyi Li, Yurong You,
Apoorva Sharma, Maximilian Igl, Peter Karkus, Danfei Xu, Boris Ivanovic, Yue Wang,
Marco Pavone
year: '2024'
abstract: We present STORM, a spatio-temporal reconstruction model designed for
reconstructing dynamic outdoor scenes from sparse observations. Existing dynamic
reconstruction methods often rely on per-scene optimization, dense observations
across space and time, and strong motion supervision, resulting in lengthy optimization
times, limited generalization to novel views or scenes, and degenerated quality
caused by noisy pseudo-labels for dynamics. To address these challenges, STORM
leverages a data-driven Transformer architecture that directly infers dynamic
3D scene representations--parameterized by 3D Gaussians and their velocities--in
a single forward pass. Our key design is to aggregate 3D Gaussians from all frames
using self-supervised scene flows, transforming them to the target timestep to
enable complete (i.e., "amodal") reconstructions from arbitrary viewpoints at
any moment in time. As an emergent property, STORM automatically captures dynamic
instances and generates high-quality masks using only reconstruction losses. Extensive
experiments on public datasets show that STORM achieves precise dynamic scene
reconstruction, surpassing state-of-the-art per-scene optimization methods (+4.3
to 6.6 PSNR) and existing feed-forward approaches (+2.1 to 4.7 PSNR) in dynamic
regions. STORM reconstructs large-scale outdoor scenes in 200ms, supports real-time
rendering, and outperforms competitors in scene flow estimation, improving 3D
EPE by 0.422m and Acc5 by 28.02%. Beyond reconstruction, we showcase four additional
applications of our model, illustrating the potential of self-supervised learning
for broader dynamic scene understanding.
project_page: null
paper: https://arxiv.org/pdf/2501.00602.pdf
code: null
video: https://jiawei-yang.github.io/STORM/
tags:
- Autonomous Driving
- Dynamic
- Large-Scale
- Video
thumbnail: assets/thumbnails/yang2024storm.jpg
publication_date: '2024-12-31T18:59:58+00:00'
- id: mao2024dreamdrive
title: 'DreamDrive: Generative 4D Scene Modeling from Street View Images'
authors: Jiageng Mao, Boyi Li, Boris Ivanovic, Yuxiao Chen, Yan Wang, Yurong You,
Chaowei Xiao, Danfei Xu, Marco Pavone, Yue Wang
year: '2024'
abstract: Synthesizing photo-realistic visual observations from an ego vehicle's
driving trajectory is a critical step towards scalable training of self-driving
models. Reconstruction-based methods create 3D scenes from driving logs and synthesize
geometry-consistent driving videos through neural rendering, but their dependence
on costly object annotations limits their ability to generalize to in-the-wild
driving scenarios. On the other hand, generative models can synthesize action-conditioned
driving videos in a more generalizable way but often struggle with maintaining
3D visual consistency. In this paper, we present DreamDrive, a 4D spatial-temporal
scene generation approach that combines the merits of generation and reconstruction,
to synthesize generalizable 4D driving scenes and dynamic driving videos with
3D consistency. Specifically, we leverage the generative power of video diffusion
models to synthesize a sequence of visual references and further elevate them
to 4D with a novel hybrid Gaussian representation. Given a driving trajectory,
we then render 3D-consistent driving videos via Gaussian splatting. The use of
generative priors allows our method to produce high-quality 4D scenes from in-the-wild
driving data, while neural rendering ensures 3D-consistent video generation from
the 4D scenes. Extensive experiments on nuScenes and street view images demonstrate
that DreamDrive can generate controllable and generalizable 4D driving scenes,
synthesize novel views of driving videos with high fidelity and 3D consistency,
decompose static and dynamic elements in a self-supervised manner, and enhance
perception and planning tasks for autonomous driving.
project_page: null
paper: https://arxiv.org/pdf/2501.00601.pdf
code: null
video: null
tags:
- Autonomous Driving
- Dynamic
- Feed-Forward
thumbnail: assets/thumbnails/mao2024dreamdrive.jpg
publication_date: '2024-12-31T18:59:57+00:00'
- id: wang2024sgsplatting
title: 'SG-Splatting: Accelerating 3D Gaussian Splatting with Spherical Gaussians'
authors: Yiwen Wang, Siyuan Chen, Ran Yi
year: '2024'
abstract: '3D Gaussian Splatting is emerging as a state-of-the-art technique in
novel view synthesis, recognized for its impressive balance between visual quality,
speed, and rendering efficiency. However, reliance on third-degree spherical harmonics
for color representation introduces significant storage demands and computational
overhead, resulting in a large memory footprint and slower rendering speed. We
introduce SG-Splatting with Spherical Gaussians based color representation, a
novel approach to enhance rendering speed and quality in novel view synthesis.
Our method first represents view-dependent color using Spherical Gaussians, instead
of three degree spherical harmonics, which largely reduces the number of parameters
used for color representation, and significantly accelerates the rendering process.
We then develop an efficient strategy for organizing multiple Spherical Gaussians,
optimizing their arrangement to achieve a balanced and accurate scene representation.
To further improve rendering quality, we propose a mixed representation that combines
Spherical Gaussians with low-degree spherical harmonics, capturing both high-
and low-frequency color information effectively. SG-Splatting also has plug-and-play
capability, allowing it to be easily integrated into existing systems. This approach
improves computational efficiency and overall visual fidelity, making it a practical
solution for real-time applications.
'
project_page: null
paper: https://arxiv.org/pdf/2501.00342.pdf
code: null
video: null
tags:
- Acceleration
thumbnail: assets/thumbnails/wang2024sgsplatting.jpg
publication_date: '2024-12-31T08:31:52+00:00'
- id: cha2024perse
title: 'PERSE: Personalized 3D Generative Avatars from A Single Portrait'
authors: Hyunsoo Cha, Inhee Lee, Hanbyul Joo
year: '2024'
abstract: We present PERSE, a method for building an animatable personalized generative
avatar from a reference portrait. Our avatar model enables facial attribute editing
in a continuous and disentangled latent space to control each facial attribute,
while preserving the individual's identity. To achieve this, our method begins
by synthesizing large-scale synthetic 2D video datasets, where each video contains
consistent changes in the facial expression and viewpoint, combined with a variation
in a specific facial attribute from the original input. We propose a novel pipeline
to produce high-quality, photorealistic 2D videos with facial attribute editing.
Leveraging this synthetic attribute dataset, we present a personalized avatar
creation method based on the 3D Gaussian Splatting, learning a continuous and
disentangled latent space for intuitive facial attribute manipulation. To enforce
smooth transitions in this latent space, we introduce a latent space regularization
technique by using interpolated 2D faces as supervision. Compared to previous
approaches, we demonstrate that PERSE generates high-quality avatars with interpolated
attributes while preserving identity of reference person.
project_page: https://hyunsoocha.github.io/perse/
paper: https://arxiv.org/pdf/2412.21206v1.pdf
code: null
video: https://youtu.be/zX881Zx03o4
tags:
- Avatar
- GAN
- Project
- Video
thumbnail: assets/thumbnails/cha2024perse.jpg
publication_date: '2024-12-30T18:59:58+00:00'
- id: yang20244d
title: '4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives'
authors: Zeyu Yang, Zijie Pan, Xiatian Zhu, Li Zhang, Yu-Gang Jiang, Philip H. S.
Torr
year: '2024'
abstract: Dynamic 3D scene representation and novel view synthesis from captured
videos are crucial for enabling immersive experiences required by AR/VR and metaverse
applications. However, this task is challenging due to the complexity of unconstrained
real-world scenes and their temporal dynamics. In this paper, we frame dynamic
scenes as a spatio-temporal 4D volume learning problem, offering a native explicit
reformulation with minimal assumptions about motion, which serves as a versatile
dynamic scene learning framework. Specifically, we represent a target dynamic
scene using a collection of 4D Gaussian primitives with explicit geometry and
appearance features, dubbed as 4D Gaussian splatting (4DGS). This approach can
capture relevant information in space and time by fitting the underlying spatio-temporal
volume. Modeling the spacetime as a whole with 4D Gaussians parameterized by anisotropic
ellipses that can rotate arbitrarily in space and time, our model can naturally
learn view-dependent and time-evolved appearance with 4D spherindrical harmonics.
Notably, our 4DGS model is the first solution that supports real-time rendering
of high-resolution, photorealistic novel views for complex dynamic scenes. To
enhance efficiency, we derive several compact variants that effectively reduce
memory footprint and mitigate the risk of overfitting. Extensive experiments validate
the superiority of 4DGS in terms of visual quality and efficiency across a range
of dynamic scene-related tasks (e.g., novel view synthesis, 4D generation, scene
understanding) and scenarios (e.g., single object, indoor scenes, driving environments,
synthetic and real data).
project_page: null
paper: https://arxiv.org/pdf/2412.20720v1.pdf
code: null
video: null
tags:
- Compression
- Dynamic
- Large-Scale
thumbnail: assets/thumbnails/yang20244d.jpg
publication_date: '2024-12-30T05:30:26+00:00'
- id: cai2024dust
title: 'Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from
Sparse Uncalibrated Images'
authors: Xudong Cai, Yongcai Wang, Zhaoxin Fan, Deng Haoran, Shuo Wang, Wanting
Li, Deying Li, Lun Luo, Minhang Wang, Jintao Xu
year: '2024'
abstract: Photo-realistic scene reconstruction from sparse-view, uncalibrated images
is highly required in practice. Although some successes have been made, existing
methods are either Sparse-View but require accurate camera parameters (i.e., intrinsic
and extrinsic), or SfM-free but need densely captured images. To combine the advantages
of both methods while addressing their respective weaknesses, we propose Dust
to Tower (D2T), an accurate and efficient coarse-to-fine framework to optimize
3DGS and image poses simultaneously from sparse and uncalibrated images. Our key
idea is to first construct a coarse model efficiently and subsequently refine
it using warped and inpainted images at novel viewpoints. To do this, we first
introduce a Coarse Construction Module (CCM) which exploits a fast Multi-View
Stereo model to initialize a 3D Gaussian Splatting (3DGS) and recover initial
camera poses. To refine the 3D model at novel viewpoints, we propose a Confidence
Aware Depth Alignment (CADA) module to refine the coarse depth maps by aligning
their confident parts with estimated depths by a Mono-depth model. Then, a Warped
Image-Guided Inpainting (WIGI) module is proposed to warp the training images
to novel viewpoints by the refined depth maps, and inpainting is applied to fulfill
the ``holes" in the warped images caused by view-direction changes, providing
high-quality supervision to further optimize the 3D model and the camera poses.
Extensive experiments and ablation studies demonstrate the validity of D2T and
its design choices, achieving state-of-the-art performance in both tasks of novel
view synthesis and pose estimation while keeping high efficiency. Codes will be
publicly available.
project_page: null
paper: https://arxiv.org/pdf/2412.19518.pdf
code: null
video: null
tags:
- Inpainting
- Poses
- Sparse
thumbnail: assets/thumbnails/cai2024dust.jpg
publication_date: '2024-12-27T08:19:34+00:00'
- id: yao2024reflective
title: Reflective Gaussian Splatting
authors: Yuxuan Yao, Zixuan Zeng, Chun Gu, Xiatian Zhu, Li Zhang
year: '2024'
abstract: 'Novel view synthesis has experienced significant advancements owing to
increasingly capable NeRF- and 3DGS-based methods. However, reflective object
reconstruction remains challenging, lacking a proper solution to achieve real-time,
high-quality rendering while accommodating inter-reflection. To fill this gap,
we introduce a Reflective Gaussian splatting (\textbf{Ref-Gaussian}) framework
characterized with two components: (I) {\em Physically based deferred rendering}
that empowers the rendering equation with pixel-level material properties via
formulating split-sum approximation; (II) {\em Gaussian-grounded inter-reflection}
that realizes the desired inter-reflection function within a Gaussian splatting
paradigm for the first time. To enhance geometry modeling, we further introduce
material-aware normal propagation and an initial per-Gaussian shading stage, along
with 2D Gaussian primitives. Extensive experiments on standard datasets demonstrate
that Ref-Gaussian surpasses existing approaches in terms of quantitative metrics,
visual quality, and compute efficiency. Further, we show that our method serves
as a unified solution for both reflective and non-reflective scenes, going beyond
the previous alternatives focusing on only reflective scenes. Also, we illustrate
that Ref-Gaussian supports more applications such as relighting and editing.
'
project_page: https://fudan-zvg.github.io/ref-gaussian/
paper: https://arxiv.org/pdf/2412.19282.pdf
code: null
video: null
tags:
- Meshing
- Project
- Ray Tracing
- Relight
thumbnail: assets/thumbnails/yao2024reflective.jpg
publication_date: '2024-12-26T16:58:35+00:00'
- id: qian2024weathergs
title: 'WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian
Splatting'
authors: Chenghao Qian, Yuhu Guo, Wenjing Li, Gustav Markkula
year: '2024'
abstract: 3D Gaussian Splatting (3DGS) has gained significant attention for 3D scene
reconstruction, but still suffers from complex outdoor environments, especially
under adverse weather. This is because 3DGS treats the artifacts caused by adverse
weather as part of the scene and will directly reconstruct them, largely reducing
the clarity of the reconstructed scene. To address this challenge, we propose
WeatherGS, a 3DGS-based framework for reconstructing clear scenes from multi-view
images under different weather conditions. Specifically, we explicitly categorize
the multi-weather artifacts into the dense particles and lens occlusions that
have very different characters, in which the former are caused by snowflakes and
raindrops in the air, and the latter are raised by the precipitation on the camera
lens. In light of this, we propose a dense-to-sparse preprocess strategy, which
sequentially removes the dense particles by an Atmospheric Effect Filter (AEF)
and then extracts the relatively sparse occlusion masks with a Lens Effect Detector
(LED). Finally, we train a set of 3D Gaussians by the processed images and generated
masks for excluding occluded areas, and accurately recover the underlying clear
scene by Gaussian splatting. We conduct a diverse and challenging benchmark to
facilitate the evaluation of 3D reconstruction under complex weather scenarios.
Extensive experiments on this benchmark demonstrate that our WeatherGS consistently
produces high-quality, clean scenes across various weather scenarios, outperforming
existing state-of-the-art methods.
project_page: null
paper: https://arxiv.org/pdf/2412.18862.pdf
code: https://github.com/Jumponthemoon/WeatherGS
video: null
tags:
- Code
- In the Wild
thumbnail: assets/thumbnails/qian2024weathergs.jpg
publication_date: '2024-12-25T10:16:57+00:00'
- id: lyu2024facelift
title: 'FaceLift: Single Image to 3D Head with View Generation and GS-LRM'
authors: Weijie Lyu, Yi Zhou, Ming-Hsuan Yang, Zhixin Shu
year: '2024'
abstract: We present FaceLift, a feed-forward approach for rapid, high-quality,
360-degree head reconstruction from a single image. Our pipeline begins by employing
a multi-view latent diffusion model that generates consistent side and back views
of the head from a single facial input. These generated views then serve as input
to a GS-LRM reconstructor, which produces a comprehensive 3D representation using
Gaussian splats. To train our system, we develop a dataset of multi-view renderings
using synthetic 3D human head as-sets. The diffusion-based multi-view generator
is trained exclusively on synthetic head images, while the GS-LRM reconstructor
undergoes initial training on Objaverse followed by fine-tuning on synthetic head
data. FaceLift excels at preserving identity and maintaining view consistency
across views. Despite being trained solely on synthetic data, FaceLift demonstrates
remarkable generalization to real-world images. Through extensive qualitative
and quantitative evaluations, we show that FaceLift outperforms state-of-the-art
methods in 3D head reconstruction, highlighting its practical applicability and
robust performance on real-world images. In addition to single image reconstruction,
FaceLift supports video inputs for 4D novel view synthesis and seamlessly integrates
with 2D reanimation techniques to enable 3D facial animation.
project_page: https://www.wlyu.me/FaceLift/
paper: https://arxiv.org/pdf/2412.17812.pdf
code: null
video: https://huggingface.co/wlyu/FaceLift/resolve/main/videos/website_video.mp4
tags:
- Avatar
- Feed-Forward
- Project
- Video
thumbnail: assets/thumbnails/lyu2024facelift.jpg
publication_date: '2024-12-23T18:59:49+00:00'
- id: shao2024gausim
title: 'GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator'
authors: Yidi Shao, Mu Huang, Chen Change Loy, Bo Dai
year: '2024'
abstract: In this work, we introduce GauSim, a novel neural network-based simulator
designed to capture the dynamic behaviors of real-world elastic objects represented
through Gaussian kernels. Unlike traditional methods that treat kernels as particles
within particle-based simulations, we leverage continuum mechanics, modeling each
kernel as a continuous piece of matter to account for realistic deformations without
idealized assumptions. To improve computational efficiency and fidelity, we employ
a hierarchical structure that organizes kernels into Center of Mass Systems (CMS)
with explicit formulations, enabling a coarse-to-fine simulation approach. This
structure significantly reduces computational overhead while preserving detailed
dynamics. In addition, GauSim incorporates explicit physics constraints, such
as mass and momentum conservation, ensuring interpretable results and robust,
physically plausible simulations. To validate our approach, we present a new dataset,
READY, containing multi-view videos of real-world elastic deformations. Experimental
results demonstrate that GauSim achieves superior performance compared to existing
physics-driven baselines, offering a practical and accurate solution for simulating
complex dynamic behaviors. Code and model will be released.
project_page: https://www.mmlab-ntu.com/project/gausim/index.html
paper: https://arxiv.org/pdf/2412.17804.pdf
code: null
video: null
tags:
- Dynamic
- Physics
- Project
thumbnail: assets/thumbnails/shao2024gausim.jpg
publication_date: '2024-12-23T18:58:17+00:00'
- id: jin2024activegs
title: 'ActiveGS: Active Scene Reconstruction using Gaussian Splatting'
authors: Liren Jin, Xingguang Zhong, Yue Pan, Jens Behley, Cyrill Stachniss, Marija
Popović
year: '2024'
abstract: 'Robotics applications often rely on scene reconstructions to enable downstream
tasks. In this work, we tackle the challenge of actively building an accurate
map of an unknown scene using an on-board RGB-D camera. We propose a hybrid map
representation that combines a Gaussian splatting map with a coarse voxel map,
leveraging the strengths of both representations: the high-fidelity scene reconstruction
capabilities of Gaussian splatting and the spatial modelling strengths of the
voxel map. The core of our framework is an effective confidence modelling technique
for the Gaussian splatting map to identify under-reconstructed areas, while utilising
spatial information from the voxel map to target unexplored areas and assist in
collision-free path planning. By actively collecting scene information in under-reconstructed
and unexplored areas for map updates, our approach achieves superior Gaussian
splatting reconstruction results compared to state-of-the-art approaches. Additionally,
we demonstrate the applicability of our active scene reconstruction framework
in the real world using an unmanned aerial vehicle.
'
project_page: null
paper: https://arxiv.org/pdf/2412.17769.pdf
code: null
video: null
tags:
- Meshing
- Robotics
- SLAM
thumbnail: assets/thumbnails/jin2024activegs.jpg
publication_date: '2024-12-23T18:29:03+00:00'
date_source: arxiv
- id: gao2024cosurfgscollaborative
title: CoSurfGS:Collaborative 3D Surface Gaussian Splatting with Distributed Learning
for Large Scene Reconstruction
authors: Yuanyuan Gao, Yalun Dai, Hao Li, Weicai Ye, Junyi Chen, Danpeng Chen, Dingwen
Zhang, Tong He, Guofeng Zhang, Junwei Han
year: '2024'
abstract: 3D Gaussian Splatting (3DGS) has demonstrated impressive performance in
scene reconstruction. However, most existing GS-based surface reconstruction methods
focus on 3D objects or limited scenes. Directly applying these methods to large-scale
scene reconstruction will pose challenges such as high memory costs, excessive
time consumption, and lack of geometric detail, which makes it difficult to implement
in practical applications. To address these issues, we propose a multi-agent collaborative
fast 3DGS surface reconstruction framework based on distributed learning for large-scale
surface reconstruction. Specifically, we develop local model compression (LMC)
and model aggregation schemes (MAS) to achieve high-quality surface representation
of large scenes while reducing GPU memory consumption. Extensive experiments on
Urban3d, MegaNeRF, and BlendedMVS demonstrate that our proposed method can achieve
fast and scalable high-fidelity surface reconstruction and photorealistic rendering.
project_page: https://gyy456.github.io/CoSurfGS/
paper: https://arxiv.org/pdf/2412.17612.pdf
code: null
video: null
tags:
- Distributed
- Large-Scale
- Meshing
- Project
thumbnail: assets/thumbnails/gao2024cosurfgscollaborative.jpg
publication_date: '2024-12-23T14:31:15+00:00'
- id: gui2024balanced
title: 'Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling'
authors: Hao Gui, Lin Hu, Rui Chen, Mingxiao Huang, Yuxin Yin, Jin Yang, Yong Wu
year: '2024'
abstract: '3D Gaussian Splatting (3DGS) is increasingly attracting attention in
both academia and industry owing to its superior visual quality and rendering
speed. However, training a 3DGS model remains a time-intensive task, especially
in load imbalance scenarios where workload diversity among pixels and Gaussian
spheres causes poor renderCUDA kernel performance. We introduce Balanced 3DGS,
a Gaussian-wise parallelism rendering with fine-grained tiling approach in 3DGS
training process, perfectly solving load-imbalance issues. First, we innovatively
introduce the inter-block dynamic workload distribution technique to map workloads
to Streaming Multiprocessor(SM) resources within a single GPU dynamically, which
constitutes the foundation of load balancing. Second, we are the first to propose
the Gaussian-wise parallel rendering technique to significantly reduce workload
divergence inside a warp, which serves as a critical component in addressing load
imbalance. Based on the above two methods, we further creatively put forward the
fine-grained combined load balancing technique to uniformly distribute workload
across all SMs, which boosts the forward renderCUDA kernel performance by up to
7.52x. Besides, we present a self-adaptive render kernel selection strategy during
the 3DGS training process based on different load-balance situations, which effectively
improves training efficiency.
'
project_page: null
paper: https://arxiv.org/pdf/2412.17378.pdf
code: null
video: null
tags:
- Acceleration
thumbnail: assets/thumbnails/gui2024balanced.jpg
publication_date: '2024-12-23T08:26:30+00:00'
- id: jambon2024interactive
title: Interactive Scene Authoring with Specialized Generative Primitives
authors: Clément Jambon, Changwoon Choi, Dongsu Zhang, Olga Sorkine-Hornung, Young
Min Kim
year: '2024'
abstract: 'Generating high-quality 3D digital assets often requires expert knowledge
of complex design tools. We introduce Specialized Generative Primitives, a generative
framework that allows non-expert users to author high-quality 3D scenes in a seamless,
lightweight, and controllable manner. Each primitive is an efficient generative
model that captures the distribution of a single exemplar from the real world.
With our framework, users capture a video of an environment, which we turn into
a high-quality and explicit appearance model thanks to 3D Gaussian Splatting.
Users then select regions of interest guided by semantically-aware features. To
create a generative primitive, we adapt Generative Cellular Automata to single-exemplar
training and controllable generation. We decouple the generative task from the
appearance model by operating on sparse voxels and we recover a high-quality output
with a subsequent sparse patch consistency step. Each primitive can be trained
within 10 minutes and used to author new scenes interactively in a fully compositional
manner. We showcase interactive sessions where various primitives are extracted
from real-world scenes and controlled to create 3D assets and scenes in a few
minutes. We also demonstrate additional capabilities of our primitives: handling
various 3D representations to control generation, transferring appearances, and
editing geometries.
'
project_page: null
paper: https://arxiv.org/pdf/2412.16253.pdf
code: null
video: null
tags:
- Editing
- World Generation
thumbnail: assets/thumbnails/jambon2024interactive.jpg
publication_date: '2024-12-20T04:39:50+00:00'
- id: shen2024solidgs
title: 'SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface
Reconstruction'
authors: Zhuowen Shen, Yuan Liu, Zhang Chen, Zhong Li, Jiepeng Wang, Yongqing Liang,
Zhengming Yu, Jingdong Zhang, Yi Xu, Scott Schaefer, Xin Li, Wenping Wang
year: '2024'
abstract: 'Gaussian splatting has achieved impressive improvements for both novel-view
synthesis and surface reconstruction from multi-view images. However, current
methods still struggle to reconstruct high-quality surfaces from only sparse view
input images using Gaussian splatting. In this paper, we propose a novel method
called SolidGS to address this problem. We observed that the reconstructed geometry
can be severely inconsistent across multi-views, due to the property of Gaussian
function in geometry rendering. This motivates us to consolidate all Gaussians
by adopting a more solid kernel function, which effectively improves the surface
reconstruction quality. With the additional help of geometrical regularization
and monocular normal estimation, our method achieves superior performance on the
sparse view surface reconstruction than all the Gaussian splatting methods and
neural field methods on the widely used DTU, Tanks-and-Temples, and LLFF datasets.
'
project_page: https://mickshen7558.github.io/projects/SolidGS/
paper: https://arxiv.org/pdf/2412.15400.pdf
code: null
video: null
tags:
- Meshing
- Project
- Sparse
thumbnail: assets/thumbnails/shen2024solidgs.jpg
publication_date: '2024-12-19T21:04:43+00:00'
date_source: arxiv
- id: saito2024squeezeme
title: 'SqueezeMe: Efficient Gaussian Avatars for VR'
authors: Shunsuke Saito, Stanislav Pidhorskyi, Igor Santesteban, Forrest Iandola,
Divam Gupta, Anuj Pahuja, Nemanja Bartolovic, Frank Yu, Emanuel Garbin, Tomas
Simon
year: '2024'
abstract: "Gaussian Splatting has enabled real-time 3D human avatars with unprecedented\
\ levels of visual quality. While previous methods require a desktop GPU for real-time\
\ inference of a single avatar, we aim to squeeze multiple Gaussian avatars onto\
\ a portable virtual reality headset with real-time drivable inference. We begin\
\ by training a previous work, Animatable Gaussians, on a high quality dataset\
\ captured with 512 cameras. The Gaussians are animated by controlling base set\
\ of Gaussians with linear blend skinning (LBS) motion and then further adjusting\
\ the Gaussians with a neural network decoder to correct their appearance. When\
\ deploying the model on a Meta Quest 3 VR headset, we find two major computational\
\ bottlenecks: the decoder and the rendering. To accelerate the decoder, we train\
\ the Gaussians in UV-space instead of pixel-space, and we distill the decoder\
\ to a single neural network layer. Further, we discover that neighborhoods of\
\ Gaussians can share a single corrective from the decoder, which provides an\
\ additional speedup. To accelerate the rendering, we develop a custom pipeline\
\ in Vulkan that runs on the mobile GPU. Putting it all together, we run 3 Gaussian\
\ avatars concurrently at 72 FPS on a VR headset. \n"
project_page: https://forresti.github.io/squeezeme.
paper: https://arxiv.org/pdf/2412.15171.pdf
code: null
video: null
tags:
- Avatar
- Dynamic
- Project
thumbnail: assets/thumbnails/saito2024squeezeme.jpg
publication_date: '2024-12-19T18:46:55+00:00'
date_source: arxiv
- id: lu2024turbogs
title: 'Turbo-GS: Accelerating 3D Gaussian Fitting for High-Quality Radiance Fields'
authors: Tao Lu, Ankit Dhiman, R Srinath, Emre Arslan, Angela Xing, Yuanbo Xiangli,
R Venkatesh Babu, Srinath Sridhar
year: '2024'
abstract: 'Novel-view synthesis is an important problem in computer vision with
applications in 3D reconstruction, mixed reality, and robotics. Recent methods
like 3D Gaussian Splatting (3DGS) have become the preferred method for this task,
providing high-quality novel views in real time. However, the training time of
a 3DGS model is slow, often taking 30 minutes for a scene with 200 views. In contrast,
our goal is to reduce the optimization time by training for fewer steps while
maintaining high rendering quality. Specifically, we combine the guidance from
both the position error and the appearance error to achieve a more effective densification.
To balance the rate between adding new Gaussians and fitting old Gaussians, we
develop a convergence-aware budget control mechanism. Moreover, to make the densification
process more reliable, we selectively add new Gaussians from mostly visited regions.
With these designs, we reduce the Gaussian optimization steps to one-third of
the previous approach while achieving a comparable or even better novel view rendering
quality. To further facilitate the rapid fitting of 4K resolution images, we introduce
a dilation-based rendering technique. Our method, Turbo-GS, speeds up optimization
for typical scenes and scales well to high-resolution (4K) scenarios on standard
datasets. Through extensive experiments, we show that our method is significantly
faster in optimization than other methods while retaining quality. Project page:
https://ivl.cs.brown.edu/research/turbo-gs.
'
project_page: https://ivl.cs.brown.edu/research/turbo-gs
paper: https://arxiv.org/pdf/2412.13547v1.pdf
code: null
video: null
tags:
- Acceleration
- Densification
- Project
thumbnail: assets/thumbnails/lu2024turbogs.jpg
publication_date: '2024-12-18T06:46:40+00:00'
date_source: arxiv
- id: sun2024realtime
title: Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double
Unprojected Textures
authors: Guoxing Sun, Rishabh Dabral, Heming Zhu, Pascal Fua, Christian Theobalt,
Marc Habermann
year: '2024'
abstract: 'Real-time free-view human rendering from sparse-view RGB inputs is a
challenging task due to the sensor scarcity and the tight time budget. To ensure
efficiency, recent methods leverage 2D CNNs operating in texture space to learn
rendering primitives. However, they either jointly learn geometry and appearance,
or completely ignore sparse image information for geometry estimation, significantly
harming visual quality and robustness to unseen body poses. To address these issues,
we present Double Unprojected Textures, which at the core disentangles coarse
geometric deformation estimation from appearance synthesis, enabling robust and
photorealistic 4K rendering in real-time. Specifically, we first introduce a novel
image-conditioned template deformation network, which estimates the coarse deformation
of the human template from a first unprojected texture. This updated geometry
is then used to apply a second and more accurate texture unprojection. The resulting
texture map has fewer artifacts and better alignment with input views, which benefits
our learning of finer-level geometry and appearance represented by Gaussian splats.
We validate the effectiveness and efficiency of the proposed method in quantitative
and qualitative experiments, which significantly surpasses other state-of-the-art
methods.
'
project_page: https://vcai.mpi-inf.mpg.de/projects/DUT/
paper: https://arxiv.org/pdf/2412.13183v1.pdf
code: null
video: https://vcai.mpi-inf.mpg.de/projects/DUT/videos/main_video.mp4
tags:
- Avatar
- Project
- Sparse
- Texturing
- Video
thumbnail: assets/thumbnails/sun2024realtime.jpg
publication_date: '2024-12-17T18:57:38+00:00'
date_source: arxiv
- id: weiss2024gaussian
title: 'Gaussian Billboards: Expressive 2D Gaussian Splatting with Textures'
authors: Sebastian Weiss, Derek Bradley
year: '2024'
abstract: 'Gaussian Splatting has recently emerged as the go-to representation for
reconstructing and rendering 3D scenes. The transition from 3D to 2D Gaussian
primitives has further improved multi-view consistency and surface reconstruction
accuracy. In this work we highlight the similarity between 2D Gaussian Splatting
(2DGS) and billboards from traditional computer graphics. Both use flat semi-transparent
2D geometry that is positioned, oriented and scaled in 3D space. However 2DGS
uses a solid color per splat and an opacity modulated by a Gaussian distribution,
where billboards are more expressive, modulating the color with a uv-parameterized
texture. We propose to unify these concepts by presenting Gaussian Billboards,
a modification of 2DGS to add spatially-varying color achieved using per-splat
texture interpolation. The result is a mixture of the two representations, which
benefits from both the robust scene optimization power of 2DGS and the expressiveness
of texture mapping. We show that our method can improve the sharpness and quality
of the scene representation in a wide range of qualitative and quantitative evaluations
compared to the original 2DGS implementation.
'
project_page: null
paper: https://arxiv.org/pdf/2412.12734v1.pdf
code: null
video: null
tags:
- 2DGS
- Texturing
thumbnail: assets/thumbnails/weiss2024gaussian.jpg
publication_date: '2024-12-17T09:57:04+00:00'
date_source: arxiv
- id: zhang2024pansplat
title: 'PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting'
authors: Cheng Zhang, Haofei Xu, Qianyi Wu, Camilo Cruz Gambardella, Dinh Phung,
Jianfei Cai
year: '2024'
abstract: 'With the advent of portable 360{\deg} cameras, panorama has gained significant
attention in applications like virtual reality (VR), virtual tours, robotics,
and autonomous driving. As a result, wide-baseline panorama view synthesis has
emerged as a vital task, where high resolution, fast inference, and memory efficiency
are essential. Nevertheless, existing methods are typically constrained to lower
resolutions (512 $\times$ 1024) due to demanding memory and computational requirements.
In this paper, we present PanSplat, a generalizable, feed-forward approach that
efficiently supports resolution up to 4K (2048 $\times$ 4096). Our approach features
a tailored spherical 3D Gaussian pyramid with a Fibonacci lattice arrangement,
enhancing image quality while reducing information redundancy. To accommodate
the demands of high resolution, we propose a pipeline that integrates a hierarchical
spherical cost volume and Gaussian heads with local operations, enabling two-step
deferred backpropagation for memory-efficient training on a single A100 GPU. Experiments
demonstrate that PanSplat achieves state-of-the-art results with superior efficiency
and image quality across both synthetic and real-world datasets. Code will be
available at \url{https://github.com/chengzhag/PanSplat}.
'
project_page: https://chengzhag.github.io/publication/pansplat/
paper: https://arxiv.org/pdf/2412.12096v1.pdf
code: null
video: https://youtu.be/R3qIzL77ZSc
tags:
- 360 degree
- Feed-Forward
- Project
- Video
- World Generation
thumbnail: assets/thumbnails/zhang2024pansplat.jpg
publication_date: '2024-12-16T18:59:45+00:00'
date_source: arxiv
- id: taubner2024cap4d
title: 'CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View
Diffusion Models'
authors: Felix Taubner, Ruihang Zhang, Mathieu Tuli, David B. Lindell
year: '2024'
abstract: 'Reconstructing photorealistic and dynamic portrait avatars from images
is essential to many applications including advertising, visual effects, and virtual
reality. Depending on the application, avatar reconstruction involves different
capture setups and constraints − for example, visual effects studios use camera
arrays to capture hundreds of reference images, while content creators may seek
to animate a single portrait image downloaded from the internet. As such, there
is a large and heterogeneous ecosystem of methods for avatar reconstruction. Techniques
based on multi-view stereo or neural rendering achieve the highest quality results,
but require hundreds of reference images. Recent generative models produce convincing
avatars from a single reference image, but visual fidelity yet lags behind multi-view
techniques. Here, we present CAP4D: an approach that uses a morphable multi-view
diffusion model to reconstruct photoreal 4D (dynamic 3D) portrait avatars from
any number of reference images (i.e., one to 100) and animate and render them
in real time. Our approach demonstrates state-of-the-art performance for single-,
few-, and multi-image 4D portrait avatar reconstruction, and takes steps to bridge
the gap in visual fidelity between single-image and multi-view reconstruction
techniques.'
project_page: https://felixtaubner.github.io/cap4d/
paper: https://arxiv.org/pdf/2412.12093
code: null
video: null
tags:
- Avatar
- Project
thumbnail: assets/thumbnails/taubner2024cap4d.jpg
publication_date: '2024-12-16T18:58:51+00:00'
- id: liang2024wonderland
title: 'Wonderland: Navigating 3D Scenes from a Single Image'
authors: Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri
Terzopoulos, Konstantinos N. Plataniotis, Sergey Tulyakov, Jian Ren
year: '2024'
abstract: 'This paper addresses a challenging question: How can we efficiently create
high-quality, wide-scope 3D scenes from a single arbitrary image? Existing methods
face several constraints, such as requiring multi-view data, time-consuming per-scene
optimization, low visual quality in backgrounds, and distorted reconstructions
in unseen areas. We propose a novel pipeline to overcome these limitations. Specifically,
we introduce a large-scale reconstruction model that uses latents from a video
diffusion model to predict 3D Gaussian Splattings for the scenes in a feed-forward
manner. The video diffusion model is designed to create videos precisely following
specified camera trajectories, allowing it to generate compressed video latents
that contain multi-view information while maintaining 3D consistency. We train
the 3D reconstruction model to operate on the video latent space with a progressive
training strategy, enabling the efficient generation of high-quality, wide-scope,
and generic 3D scenes. Extensive evaluations across various datasets demonstrate
that our model significantly outperforms existing methods for single-view 3D scene
generation, particularly with out-of-domain images. For the first time, we demonstrate
that a 3D reconstruction model can be effectively built upon the latent space
of a diffusion model to realize efficient 3D scene generation.
'
project_page: https://snap-research.github.io/wonderland/
paper: https://arxiv.org/pdf/2412.12091v1.pdf
code: null
video: null
tags:
- Feed-Forward
- Project
- Sparse
- World Generation
thumbnail: assets/thumbnails/liang2024wonderland.jpg
publication_date: '2024-12-16T18:58:17+00:00'
date_source: arxiv
- id: huang2024deformable
title: Deformable Radial Kernel Splatting
authors: Yi-Hua Huang, Ming-Xian Lin, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei
Cao, Xiaojuan Qi
year: '2024'
abstract: 'Recently, Gaussian splatting has emerged as a robust technique for representing
3D scenes, enabling real-time rasterization and high-fidelity rendering. However,
Gaussians'' inherent radial symmetry and smoothness constraints limit their ability
to represent complex shapes, often requiring thousands of primitives to approximate
detailed geometry. We introduce Deformable Radial Kernel (DRK), which extends
Gaussian splatting into a more general and flexible framework. Through learnable
radial bases with adjustable angles and scales, DRK efficiently models diverse
shape primitives while enabling precise control over edge sharpness and boundary
curvature. iven DRK''s planar nature, we further develop accurate ray-primitive
intersection computation for depth sorting and introduce efficient kernel culling
strategies for improved rasterization efficiency. Extensive experiments demonstrate
that DRK outperforms existing methods in both representation efficiency and rendering
quality, achieving state-of-the-art performance while dramatically reducing primitive
count.
'
project_page: https://yihua7.github.io/DRK-web/
paper: https://arxiv.org/pdf/2412.11752v1.pdf
code: null
video: null
tags:
- Optimization
- Project
- Rendering
thumbnail: assets/thumbnails/huang2024deformable.jpg
publication_date: '2024-12-16T13:11:02+00:00'
date_source: arxiv
- id: liang2024supergseg
title: 'SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians'
authors: Siyun Liang, Sen Wang, Kunyi Li, Michael Niemeyer, Stefano Gasperini, Nassir
Navab, Federico Tombari
year: '2024'
abstract: '3D Gaussian Splatting has recently gained traction for its efficient
training and real-time rendering. While the vanilla Gaussian Splatting representation
is mainly designed for view synthesis, more recent works investigated how to extend
it with scene understanding and language features. However, existing methods lack
a detailed comprehension of scenes, limiting their ability to segment and interpret
complex structures. To this end, We introduce SuperGSeg, a novel approach that
fosters cohesive, context-aware scene representation by disentangling segmentation
and language field distillation. SuperGSeg first employs neural Gaussians to learn
instance and hierarchical segmentation features from multi-view images with the
aid of off-the-shelf 2D masks. These features are then leveraged to create a sparse
set of what we call Super-Gaussians. Super-Gaussians facilitate the distillation
of 2D language features into 3D space. Through Super-Gaussians, our method enables
high-dimensional language feature rendering without extreme increases in GPU memory.
Extensive experiments demonstrate that SuperGSeg outperforms prior works on both
open-vocabulary object localization and semantic segmentation tasks.
'
project_page: https://supergseg.github.io/
paper: https://arxiv.org/pdf/2412.10231.pdf
code: null
video: null
tags:
- Language Embedding
- Project
- Segmentation
thumbnail: assets/thumbnails/liang2024supergseg.jpg
publication_date: '2024-12-13T16:01:19+00:00'
date_source: arxiv
- id: tang2024gaf
title: 'GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view
Diffusion'
authors: Jiapeng Tang, Davide Davoli, Tobias Kirschstein, Liam Schoneveld, Matthias
Nießner
year: '2024'
abstract: We propose a novel approach for reconstructing animatable 3D Gaussian
avatars from monocular videos captured by commodity devices like smartphones.
Photorealistic 3D head avatar reconstruction from such recordings is challenging
due to limited observations, which leaves unobserved regions under-constrained
and can lead to artifacts in novel views. To address this problem, we introduce
a multi-view head diffusion model, leveraging its priors to fill in missing regions
and ensure view consistency in Gaussian splatting renderings. To enable precise
viewpoint control, we use normal maps rendered from FLAME-based head reconstruction,
which provides pixel-aligned inductive biases. We also condition the diffusion
model on VAE features extracted from the input image to preserve details of facial
identity and appearance. For Gaussian avatar reconstruction, we distill multi-view
diffusion priors by using iteratively denoised images as pseudo-ground truths,
effectively mitigating over-saturation issues. To further improve photorealism,
we apply latent upsampling to refine the denoised latent before decoding it into
an image. We evaluate our method on the NeRSemble dataset, showing that GAF outperforms
the previous state-of-the-art methods in novel view synthesis and novel expression
animation. Furthermore, we demonstrate higher-fidelity avatar reconstructions
from monocular videos captured on commodity devices.
project_page: https://tangjiapeng.github.io/projects/GAF/
paper: https://arxiv.org/pdf/2412.10209
code: null
video: https://www.youtube.com/embed/QuIYTljvhygE
tags:
- Avatar
- Project
- Video
thumbnail: assets/thumbnails/tang2024gaf.jpg
publication_date: '2024-12-13T15:31:22+00:00'
- id: park2024splinegs
title: 'SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians
from Monocular Video'
authors: Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello, Jaeho Moon,