3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS

1Texas A&M University, 2Hong Kong University (HKU), 3Hong Kong University of Science and Technology (HKUST)
3R-GS Method Pipeline
Figure 1: We propose 3R-GS, a robust method for reconstructing high-quality 3D Gaussians and poses from the MASt3R's imperfect output cameras. Our method outperforms simply joint camera pose optimization along with 3DGS in a large margin.

Abstract

3D Gaussian Splatting (3DGS) has revolutionized neural rendering with its efficiency and quality, but like many novel view synthesis methods, it heavily depends on accurate camera poses from Structure-from-Motion (SfM) systems. Although recent SfM pipelines have made impressive progress, questions remain about how to further improve both their robust performance in challenging conditions (e.g., textureless scenes) and the precision of camera parameter estimation simultaneously.

We present 3R-GS, a 3D Gaussian Splatting framework that bridges this gap by jointly optimizing 3D Gaussians and camera parameters from large reconstruction priors MASt3R-SfM. We note that naively performing joint 3D Gaussian and camera optimization faces two challenges: the sensitivity to the quality of SfM initialization, and its limited capacity for global optimization, leading to suboptimal reconstruction results.

Our 3R-GS overcomes these issues by incorporating optimized practices, enabling robust scene reconstruction even with imperfect camera registration. Extensive experiments demonstrate that 3R-GS delivers high-quality novel view synthesis and precise camera pose estimation while remaining computationally efficient.

Video

This video demonstrates the novel view synthesis results of our constructed scenes. Our method significantly outperforms naive joint optimization of camera poses and 3DGS in both rendering quality and camera pose estimation. For camera pose estimation results, please refer to Figure 4.


Method

Method Pipeline
Figure 2: Overview of our 3R-GS method for joint optimization of camera poses and 3D Gaussians.

We propose 3R-GS to jointly optimize 3D Gaussians and camera poses from imperfect initial estimates provided by MASt3R-SfM. Our approach addresses two key challenges:

  • Challenge 1: Sensitivity to initialization - 3DGS optimization is highly sensitive to initial point-clouds and camera poses.
  • Challenge 2: Inefficient pose optimization - Standard 3DGS lacks mechanisms for efficient camera pose refinement.

Our solution consists of three key components:

  1. MCMC-based pose optimization for improved robustness
  2. MLP-based global pose refinement for correlated camera adjustments
  3. Rendering-free geometric constraints using epipolar geometry
MCMC Optimization
Figure 3: Motivations for three components in 3R-GS.

1. MCMC-based Pose Optimization

Problem: Gaussian primitives have limited adaptability to poor initialization, as rendering gradients only affect a small local region, preventing primitives from escaping local minima.

Solution: We adopt 3DGS-MCMC, which introduces noise-guided exploration to help Gaussians escape local minima and improve convergence:

\[G \leftarrow G + a \cdot \nabla_G \log p(G) + b \cdot \eta\]

This approach eliminates the need for heuristic-based densification and pruning in 3DGS, simplifying joint optimization with camera poses.

2. MLP-Based Global Pose Refinement

Problem: Multiple cameras often share common drift errors, but standard optimization treats them independently, potentially distorting correct local relative poses.

Solution: We employ an MLP-based global pose refiner to predict correlated pose corrections:

\[\Delta T_i = R_{MLP}(z_i)\]

This shared MLP captures global relationships across all camera views, enabling more consistent and accurate pose adjustments.

3. Rendering-Free Geometric Constraint

Problem: Standard depth-based geometric constraints for camera optimization require rendering multiple views, which is computationally prohibitive in 3DGS.

Solution: We propose a rendering-free geometric constraint using epipolar distances between image correspondences:

\[L_{geo} = \frac{1}{|E|} \sum_{(n,m)\in E} \frac{1}{|M^{n,m}|} \sum_{(x_i,x'_i)\in M^{n,m}} \text{conf}_i \cdot d(x_i, x'_i)\]

This enables efficient and globally-informed camera optimization without additional rendering overhead.

Results

Our experiments demonstrate that 3R-GS outperforms naive joint optimization approaches in both novel view synthesis quality and camera pose accuracy. Below are some qualitative results:

Pose Comparison Results
Figure 4: Visualization of camera pose registration.
Visual Comparison Results
Figure 5: Results for novel view synthesis.

BibTeX

@misc{huang20253rgsbestpracticeoptimizing,
      title={3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS}, 
      author={Zhisheng Huang and Peng Wang and Jingdong Zhang and Yuan Liu and Xin Li and Wenping Wang},
      year={2025},
      eprint={2504.04294},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.04294}, 
}