CaAD · arXiv 2026

Causality-Aware End-to-End Autonomous Driving
via Ego-Centric Joint Scene Modeling

Modeling interaction-critical futures with ego-centric joint scene hypotheses for safer closed-loop planning.

Seokha Moon¹, Minseung Lee¹, Joon Seo¹, Jinkyu Kim^1,2, Jungbeom Lee¹

¹Korea University ²Kakao Mobility

Author's Email: shmoon96@korea.ac.kr

PaperarXiv CodeComing soon

Bench2Drive Visualization Comparison

Closed-loop Bench2Drive videos compare HiP-AD and CaAD under the same interaction-critical scenes.

Scene 1

HiP-AD

CaAD

Scene 2

HiP-AD

CaAD

Scene 3

HiP-AD

CaAD

TL;DR CaAD models ego-agent causal dependencies through ego-centric joint scene representations, then aligns the ego policy with planning-oriented closed-loop feedback.

Joint Scene Modes

Builds ego-conditioned scene hypotheses instead of relying only on loosely coupled marginal futures.

Causal Policy Alignment

Refines the stochastic ego policy using feedback computed against traffic and map context.

Closed-Loop Gains

Achieves strong performance on Bench2Drive and NAVSIM with coherent interaction reasoning.

Motivation

In interactive driving, an ego trajectory is only meaningful together with the surrounding agents that respond to it. A merge may be feasible only when a nearby vehicle yields, and an overtake may be safe only when other agents maintain compatible motions. Existing end-to-end planners often predict ego and agent futures as marginal outputs, so their trajectories can be individually plausible but scene-inconsistent when evaluated together.

Key idea. CaAD represents each future as an ego-centric joint scene mode: the ego plan and the futures of interaction-relevant agents are decoded under the same hypothesis.

Method

Overall architecture of CaAD. Marginal agent embeddings preserve actor-specific evidence, while ego-centric joint-mode embeddings organize interaction hypotheses for coupled ego-agent prediction.

Comparison of all-actor and ego-centric winner-takes-all mode selection

Ego-centric winner-takes-all selects the joint mode using the ego trajectory, then supervises relevant agent responses under the selected ego mode.

Marginal-Joint Interaction

CaAD starts from decoded ego and agent embeddings, then introduces joint-mode embeddings that form compact mode-wise token sequences. Agent-Mode Attention refines these embeddings so each joint mode can carry scene information specific to the ego or agent entity.

Interaction-Relevant Agents

Instead of coupling every actor, CaAD selects agents whose marginal futures may collide with the ego spatial path. This focuses joint supervision on agents that matter for the ego maneuver and leaves distant actors to standard marginal forecasting.

Ego-Centric Mode Assignment

The selected joint mode is chosen by ego trajectory error, then the same ego-selected mode supervises relevant agent responses. This avoids all-actor winner-takes-all assignments that can let irrelevant agents dominate the scene mode.

Causality-Aware Policy Alignment

A GRPO-style post-training stage samples ego trajectories under the learned joint scene modes and scores them with planning-oriented feedback. RL updates only the ego policy, while surrounding agent forecasting remains supervised for stability.

Results

87.53 Bench2Drive Driving Score

71.81 Bench2Drive Success Rate

91.1 NAVSIM PDMS

CaAD improves closed-loop planning on both Bench2Drive and NAVSIM. The gains are especially aligned with interaction-critical behavior: joint-causal scene modeling gives the ego policy a more coherent future scene, and causality-aware policy alignment further shifts decisions toward safer outcomes.

Qualitative Bench2Drive visualization results from CaAD

Bench2Drive qualitative examples show that CaAD can produce more coherent interaction-aware behavior in scenes where baseline planners become stuck or collide after failing to reason about nearby agents.

On NAVSIM, CaAD produces trajectories that avoid collision-related objects while staying close to the ground-truth driving behavior.

PDF capture of Bench2Drive closed-loop and ability metric tables — Bench2Drive closed-loop and interaction-critical ability comparisons.

PDF capture of Bench2Drive-mini ablation table — Bench2Drive-mini ablation study.

PDF capture of NAVSIM closed-loop planning table — NAVSIM closed-loop planning comparison.

BibTeX

@article{moon2026caad,
  title={Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling},
  author={Moon, Seokha and Lee, Minseung and Seo, Joon and Kim, Jinkyu and Lee, Jungbeom},
  journal={arXiv preprint arXiv:2605.13646},
  year={2026}
}