No Calibration, No Depth, No Problem: Cross-Sensor View Synthesis with 3D Consistency CVPR 2026
- Bosch Research North America & Bosch Center for AI (BCAI)
Overview
In multi-modal learning from different sensors, such as RGB + thermal cameras, Near-Infared (NIR) cameras, or Synthetic Aperture Radar (SAR), most prior works assume paired data exist and only focus on designing the network for fusing the multi-modal features. However, in real-world applications, especially in robotics and autonomous driving, we often encounter scenarios where perfectly aligned pairs do not exist. Toward the goal, traditional pipelines require laborious calibration and depth estimation to establish cross-sensor correspondences, which can be costly and error-prone. In this work, we present the first scalable data processing framework that attempts to align the view from raw sensor sequences.
Method
We build a framework starting from the smallest componenet: matched keypoints across RGB-X sensors. Then, we leverage the keypoints as anchors to densify the images via the proposed Confidence-Aware Densification and Fusion (CADF) module and self-filtering mechanism. Last, we consolidate the densified pairs into 3D Gaussian Splatting (3DGS) to further refine the cross-sensor alignment and enable novel view synthesis.
We build a match-densify-consolidate style framework for view synthesis.
Video Comparison
RGB-Thermal
Ours-RGB
Ours-Thermal
MINIMA
XoFTR
LoFTR
LightGLUE
StyleBooth
Ours-RGB
Ours-Thermal
MINIMA
XoFTR
LoFTR
LightGLUE
StyleBooth
Ours-RGB
Ours-Thermal
MINIMA
XoFTR
LoFTR
LightGLUE
StyleBooth
Ours-RGB
Ours-Thermal
MINIMA
XoFTR
LoFTR
LightGLUE
StyleBooth
RGB-NIR
Ours-RGB
Ours-NIR
MINIMA
XoFTR
LoFTR
LightGLUE
PixNext
Ours-RGB
Ours-NIR
MINIMA
XoFTR
LoFTR
LightGLUE
PixNext
Ours-RGB
Ours-NIR
MINIMA
XoFTR
LoFTR
LightGLUE
PixNext
RGB-Normal
Ours-RGB
Ours-Normal
MINIMA
Ours-RGB
Ours-Normal
MINIMA
Ours-RGB
Ours-Normal
MINIMA
Ours-RGB
Ours-Normal
MINIMA