Three-dimensional Gaussian Splatting represents a significant advance in 3D reconstruction, yet its training process currently faces limitations in efficiency and speed. To address these challenges, Junyi Wu, Jiaming Xu, and Jinhao Li, along with Yongkang Zhou from Shanghai Jiao Tong University, Jiayi Pan from Infinigence-AI, and Xingyang Li, present BalanceGS, a novel algorithm and system co-design. This research tackles inefficiencies arising from uneven data distribution and fragmented memory access during training, introducing techniques to automatically balance point distributions, dynamically allocate workload, and optimise memory access patterns. The team demonstrates that BalanceGS achieves a substantial 1.44x speedup on A100 GPUs while maintaining reconstruction quality, representing a key step towards real-time 3D reconstruction and broader applications of Gaussian Splatting technology.

Traditional 3DGS pipelines suffer from uneven distribution of Gaussians, imbalanced computation, and fragmented memory access, all of which hinder training speed. To overcome these challenges, the scientists engineered a heuristic workload-sensitive Gaussian density control method that dynamically balances point distributions, removing up to 80% of redundant Gaussians in dense regions while simultaneously filling gaps in sparse areas. This approach employs statistical density thresholding, calculating thresholds based on global density statistics to adapt to scene-wide variations without manual tuning.

In low-density regions, new points are generated using a Gaussian-based method, perturbed by random factors to avoid clustering, and iteratively added until desired density is achieved. Conversely, in high-density regions, redundant points are merged based on opacity weighting and dynamically defined merging radii, reducing computational overhead while preserving visual fidelity. Global density statistics are recalculated periodically to ensure the density control adapts to evolving point distributions. Further enhancing efficiency, the team developed similarity-based Gaussian sampling and merging, replacing a static processing approach with adaptive workload distribution.

This system allows processing units to dynamically handle variable numbers of Gaussians based on local cluster density, addressing significant workload imbalances observed in standard 3DGS implementations. The method incorporates color quantization, grouping, and opacity accumulation to optimize processing. Finally, the scientists implemented a reordering-based memory access mapping strategy that restructures RGB storage, enabling batch loading and accelerating the training process. Experiments demonstrate that this approach achieves a 1. Researchers identified three key inefficiencies in the conventional 3DGS pipeline: uneven Gaussian density, imbalanced computational workload, and fragmented memory access. These issues stem from the way Gaussians are distributed, processed, and stored during training. To address these challenges, the team developed a co-designed algorithm-system solution. First, a heuristic workload-sensitive density control was implemented, automatically balancing Gaussian distributions by removing up to 80% of redundant Gaussians in dense regions and filling gaps in sparse areas.

This adaptive densification strategy reduces computational waste and improves rendering quality. Second, a similarity-based Gaussian sampling and merging technique replaces a static processing approach with a dynamic workload distribution, allowing processing units to handle variable numbers of Gaussians based on local cluster density. Finally, a reordering-based memory access mapping strategy restructures RGB storage, enabling batch loading into shared memory. This approach overcomes limitations of traditional storage layouts, reducing color memory access time and improving data locality. Extensive experiments demonstrate that BalanceGS achieves a 1.

44x training speedup on a high-performance GPU, with negligible degradation in reconstruction quality as measured by standard image quality metrics. The team observed a 1. 33x speedup in overall training time, and a reduction in Gaussian density deviation. Measurements confirm that the new memory access strategy reduces access time, and the adaptive density control improves computational efficiency. Researchers developed a novel framework that addresses inefficiencies in the conventional 3DGS training pipeline, specifically those related to density allocation, computational workload, and memory access. The team introduced heuristic workload-sensitive density control, which automatically balances the distribution of Gaussians by reducing redundancy in dense areas and filling gaps in sparse regions. Furthermore, they implemented similarity-based Gaussian sampling and merging, dynamically distributing computational load based on local cluster density, and a reordering-based memory access strategy to improve data loading speeds.

Extensive experiments demonstrate that BalanceGS achieves a 1. 44x speedup in training compared to standard 3DGS, while maintaining high rendering quality. This improvement stems from optimizations at the algorithmic, system, and mapping levels, resulting in more efficient use of GPU resources. While the current work focuses on enhancing training speed, the researchers acknowledge that further investigation into adaptive strategies for varying scene complexities could yield additional performance gains.