No Calibration, No Depth, No Problem: Cross-Sensor View Synthesis with 3D Consistency

No Calibration, No Depth, No Problem: Cross-Sensor View Synthesis with 3D Consistency
CVPR 2026

Bosch Research North America & Bosch Center for AI (BCAI)

Overview

In multi-modal learning from different sensors, such as RGB + thermal cameras, Near-Infared (NIR) cameras, or Synthetic Aperture Radar (SAR), most prior works assume paired data exist and only focus on designing the network for fusing the multi-modal features. However, in real-world applications, especially in robotics and autonomous driving, we often encounter scenarios where perfectly aligned pairs do not exist. Toward the goal, traditional pipelines require laborious calibration and depth estimation to establish cross-sensor correspondences, which can be costly and error-prone. In this work, we present the first scalable data processing framework that attempts to align the view from raw sensor sequences.

Overview

Method

We build a framework starting from the smallest componenet: matched keypoints across RGB-X sensors. Then, we leverage the keypoints as anchors to densify the images via the proposed Confidence-Aware Densification and Fusion (CADF) module and self-filtering mechanism. Last, we consolidate the densified pairs into 3D Gaussian Splatting (3DGS) to further refine the cross-sensor alignment and enable novel view synthesis.