CaAD · arXiv 2026

Causality-Aware End-to-End Autonomous Driving
via Ego-Centric Joint Scene Modeling

Modeling interaction-critical futures with ego-centric joint scene hypotheses for safer closed-loop planning.

1Korea University 2Kakao Mobility

Author's Email: shmoon96@korea.ac.kr

Bench2Drive Visualization Comparison

Closed-loop Bench2Drive videos compare HiP-AD and CaAD under the same interaction-critical scenes.

TL;DR CaAD models ego-agent causal dependencies through ego-centric joint scene representations, then aligns the ego policy with planning-oriented closed-loop feedback.

Overview of CaAD compared with prior marginal end-to-end autonomous driving methods

Joint Scene Modes

Builds ego-conditioned scene hypotheses instead of relying only on loosely coupled marginal futures.

Causal Policy Alignment

Refines the stochastic ego policy using feedback computed against traffic and map context.

Closed-Loop Gains

Achieves strong performance on Bench2Drive and NAVSIM with coherent interaction reasoning.

Motivation

In interactive driving, an ego trajectory is only meaningful together with the surrounding agents that respond to it. A merge may be feasible only when a nearby vehicle yields, and an overtake may be safe only when other agents maintain compatible motions. Existing end-to-end planners often predict ego and agent futures as marginal outputs, so their trajectories can be individually plausible but scene-inconsistent when evaluated together.

Key idea. CaAD represents each future as an ego-centric joint scene mode: the ego plan and the futures of interaction-relevant agents are decoded under the same hypothesis.

Method

CaAD overall architecture
Overall architecture of CaAD. Marginal agent embeddings preserve actor-specific evidence, while ego-centric joint-mode embeddings organize interaction hypotheses for coupled ego-agent prediction.
Comparison of all-actor and ego-centric winner-takes-all mode selection
Ego-centric winner-takes-all selects the joint mode using the ego trajectory, then supervises relevant agent responses under the selected ego mode.
01

Marginal-Joint Interaction

CaAD starts from decoded ego and agent embeddings, then introduces joint-mode embeddings that form compact mode-wise token sequences. Agent-Mode Attention refines these embeddings so each joint mode can carry scene information specific to the ego or agent entity.

02

Interaction-Relevant Agents

Instead of coupling every actor, CaAD selects agents whose marginal futures may collide with the ego spatial path. This focuses joint supervision on agents that matter for the ego maneuver and leaves distant actors to standard marginal forecasting.

03

Ego-Centric Mode Assignment

The selected joint mode is chosen by ego trajectory error, then the same ego-selected mode supervises relevant agent responses. This avoids all-actor winner-takes-all assignments that can let irrelevant agents dominate the scene mode.

04

Causality-Aware Policy Alignment

A GRPO-style post-training stage samples ego trajectories under the learned joint scene modes and scores them with planning-oriented feedback. RL updates only the ego policy, while surrounding agent forecasting remains supervised for stability.

Results

87.53 Bench2Drive Driving Score
71.81 Bench2Drive Success Rate
91.1 NAVSIM PDMS

CaAD improves closed-loop planning on both Bench2Drive and NAVSIM. The gains are especially aligned with interaction-critical behavior: joint-causal scene modeling gives the ego policy a more coherent future scene, and causality-aware policy alignment further shifts decisions toward safer outcomes.

Qualitative Bench2Drive visualization results from CaAD
Bench2Drive qualitative examples show that CaAD can produce more coherent interaction-aware behavior in scenes where baseline planners become stuck or collide after failing to reason about nearby agents.
NAVSIM benchmark result
On NAVSIM, CaAD produces trajectories that avoid collision-related objects while staying close to the ground-truth driving behavior.
PDF capture of Bench2Drive closed-loop and ability metric tables
Bench2Drive closed-loop and interaction-critical ability comparisons.
PDF capture of Bench2Drive-mini ablation table
Bench2Drive-mini ablation study.
PDF capture of NAVSIM closed-loop planning table
NAVSIM closed-loop planning comparison.

BibTeX

@article{moon2026caad,
  title={Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling},
  author={Moon, Seokha and Lee, Minseung and Seo, Joon and Kim, Jinkyu and Lee, Jungbeom},
  journal={arXiv preprint arXiv:2605.13646},
  year={2026}
}