NVS evaluation on KITScenes Multimodal combining standard photometric metrics with a map-based geometric fidelity test: traffic sign recall at lateral offsets probes whether synthesized views preserve 3D structure — a quality hidden by PSNR and SSIM alone.
Stay tuned for the KITScenes Multimodal Challenges!
Community leaderboard coming soon.
Preview the dataset on HuggingFace ↗ReconDrive evaluated on the KITScenes NVS benchmark (140 sequences, 216 windows). Three protocols: held-out camera reconstruction, ego-view reconstruction, and ego-view novel view synthesis.
| Method | Protocol | PSNR ↑ | SSIM ↑ | LPIPS ↓ |
|---|---|---|---|---|
| ReconDrive | Held-out Cam NVS Novel view from a withheld camera | 23.51 | 0.783 | 0.318 |
| ReconDrive | Ego Recon Ego-view reconstruction (training views) | 32.42 | 0.951 | 0.073 |
| ReconDrive | Ego NVS Ego-view novel view synthesis | 22.61 | 0.678 | 0.352 |
Photometric metrics measure perceptual quality on the original trajectory and do not capture geometric consistency at novel lateral poses.
Traffic sign recall on the front camera at seven lateral offsets (−3 m to +3 m). "Photo" is the detector's recall on the real photograph (upper bound). Degradation beyond ±1 m exposes the failure of current NVS methods to maintain 3D structural integrity — a limitation invisible to photometric metrics.
| Method | Resolution | Photo ↑ | −3 m | −2 m | −1 m | 0 m | +1 m | +2 m | +3 m |
|---|---|---|---|---|---|---|---|---|---|
| ReconDrive | Low 280 × 518 px (model scale) | 19.7 | 4.1 -79.2% | 6.7 -66.0% | 11.4 -42.1% | 18.2 -7.6% | 11.0 -44.2% | 5.5 -72.1% | 3.7 -81.2% |
| ReconDrive | High 1600 × 2844 px (sensor scale) | 21.6 | 3.4 -84.3% | 5.5 -74.5% | 9.5 -56.0% | 15.6 -27.8% | 9.4 -56.5% | 4.6 -78.7% | 3.0 -86.1% |
Relative drop vs. photo recall shown below each value. At ±3 m, recall degrades by over 80% — current generalizable NVS methods cannot maintain structural integrity at lateral translations critical for autonomous driving simulation.