Warping Distortion
Past voxel features must be aligned to the current ego frame, and interpolation can blur boundaries or introduce artifacts.
Real-time dense voxel streaming for accurate 3D occupancy prediction with distortion-aware temporal aggregation and dynamic-object query injection.
TL;DR StreamOcc keeps dense voxel features in a recurrent streaming buffer, rectifies propagated features with StreamAgg, and injects dynamic-object semantics with QueryAgg to improve accuracy under real-time constraints.
Dense voxel representations preserve fine-grained 3D spatial structure, but multi-frame dense fusion is expensive. Streaming avoids repeatedly processing all historical frames, yet naive dense voxel streaming creates interpolation artifacts during warping and weakens dynamic-object features when image evidence is projected into voxel space.
Past voxel features must be aligned to the current ego frame, and interpolation can blur boundaries or introduce artifacts.
Distant, occluded, and overlapping agents often lose fine-grained semantics during image-to-voxel projection.
Practical 3D occupancy needs strong spatial detail without the memory and latency costs of repeated dense history processing.
Propagated voxel features are motion-warped into the current ego frame, then corrected with adaptive residual refinement so temporal accumulation stays spatially consistent.
Instance-level queries capture dynamic-object semantics from image space and selectively inject them into occupied voxel regions instead of re-aggregating image features everywhere.
SOTA Results: Occ3D-nuScenes / SurroundOcc-benchmark / RayIoU
@misc{moon2025streamocc,
title={Streaming Dense Voxel Representations for 3D Occupancy Prediction},
author={Moon, Seokha and Baek, Janghyun and Jeong, Yujin and Chae, Daewon and Kim, Giseop and Lee, Jungbeom and Kim, Jinkyu and Choi, Sunwook},
year={2025},
eprint={2503.22087},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.22087}
}