CaAD · arXiv 2026
Causality-Aware End-to-End Autonomous Driving
via Ego-Centric Joint Scene Modeling
Modeling interaction-critical futures with ego-centric joint scene hypotheses for safer closed-loop planning.
Bench2Drive Visualization Comparison
Closed-loop Bench2Drive videos compare HiP-AD and CaAD under the same interaction-critical scenes.
Scene 2
Scene 3
TL;DR CaAD models ego-agent causal dependencies through ego-centric joint scene representations, then aligns the ego policy with planning-oriented closed-loop feedback.
Motivation
In interactive driving, an ego trajectory is only meaningful together with the surrounding agents that respond to it. A merge may be feasible only when a nearby vehicle yields, and an overtake may be safe only when other agents maintain compatible motions. Existing end-to-end planners often predict ego and agent futures as marginal outputs, so their trajectories can be individually plausible but scene-inconsistent when evaluated together.
Method
Marginal-Joint Interaction
CaAD starts from decoded ego and agent embeddings, then introduces joint-mode embeddings that form compact mode-wise token sequences. Agent-Mode Attention refines these embeddings so each joint mode can carry scene information specific to the ego or agent entity.
Interaction-Relevant Agents
Instead of coupling every actor, CaAD selects agents whose marginal futures may collide with the ego spatial path. This focuses joint supervision on agents that matter for the ego maneuver and leaves distant actors to standard marginal forecasting.
Ego-Centric Mode Assignment
The selected joint mode is chosen by ego trajectory error, then the same ego-selected mode supervises relevant agent responses. This avoids all-actor winner-takes-all assignments that can let irrelevant agents dominate the scene mode.
Causality-Aware Policy Alignment
A GRPO-style post-training stage samples ego trajectories under the learned joint scene modes and scores them with planning-oriented feedback. RL updates only the ego policy, while surrounding agent forecasting remains supervised for stability.
Results
CaAD improves closed-loop planning on both Bench2Drive and NAVSIM. The gains are especially aligned with interaction-critical behavior: joint-causal scene modeling gives the ego policy a more coherent future scene, and causality-aware policy alignment further shifts decisions toward safer outcomes.
BibTeX
@article{moon2026caad,
title={Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling},
author={Moon, Seokha and Lee, Minseung and Seo, Joon and Kim, Jinkyu and Lee, Jungbeom},
journal={arXiv preprint arXiv:2605.13646},
year={2026}
}