Benchmarks Multimodal

Multimodal E2E Driving

End-to-end trajectory prediction on 200 nine-second samples from the KITScenes Multimodal test-e2e split. All metrics evaluated at the 3‑second horizon, combining displacement error with map-grounded safety metrics leveraging our HD maps and LiDAR occupancy layer.

Leaderboard

Stay tuned for the KITScenes Multimodal Challenges!

Community leaderboard coming soon.

Preview the dataset on HuggingFace ↗

Paper Results

ADE and FDE follow standard protocols. Drivable-surface survival, collision-free rate, and centerline distance leverage KITScenes HD maps with a LiDAR-based occupancy layer. Bold = best; underlined = second-best.

Model FDE ↓ ADE @3 s ↓ Survival / Tracking @3 s
avgselectedconstr.overtakeinters.nightnominal Drv. surv. ↑Coll.-free ↑CL dist. ↓
Camera-based
UniAD 4.85 2.43 3.37 1.96 2.27 2.29 4.87 2.26 55.5 80.9 0.84
DMAD 4.49 2.30 3.59 1.78 2.27 2.06 5.23 2.09 58.4 85.0 0.59
SSR (non-temp.) 7.57 3.97 6.36 2.07 4.50 3.06 8.54 3.96 65.9 78.0 0.68
SSR (temporal) 5.05 2.49 4.25 2.49 2.30 2.59 5.16 1.99 67.6 79.8 0.78
Generative
Epona (AR, 10) 7.70 3.62 4.31 5.47 3.93 3.25 6.51 3.44 63.0 81.5 0.62
Epona (AR, 100) 6.04 2.86 3.57 4.27 3.24 2.56 5.48 2.63 57.2 82.1 0.66
Epona (SS, 10) 3.98 1.99 2.71 2.57 2.14 1.85 4.43 1.73 81.5 97.7 0.46
Epona (SS, 100) 3.99 1.97 2.63 2.67 2.17 1.83 4.41 1.71 78.6 98.3 0.47

SSR: non-temporal uses only the current keyframe; temporal aggregates BEV features across multiple frames. Epona: AR = autoregressive rollout; SS = single-step prediction. Numbers (10, 100) = diffusion denoising steps. 3 s predicted trajectories are evaluated at the 3-second horizon.

Metrics

ADE / FDE

↓ lower is better

Average / Final Displacement Error in metres at the 3-second prediction horizon (standard protocol).

Drv. surv. / Coll.-free

↑ higher is better

Drivable-surface survival rate and collision-free rate, computed with HD map + LiDAR occupancy.

CL dist.

↓ lower is better

Centerline distance: mean lateral deviation from the nearest HD map centerline in metres.

KIT FZI TU Delft UC3M UPM University of Toronto