Our research introduces VisionTrap, a novel method that significantly enhances trajectory prediction for autonomous vehicles by integrating visual cues from surround-view cameras and textual descriptions generated by Vision-Language Models. Additionally, we release the nuScenes-Text dataset, which augments the nuScenes dataset with rich textual descriptions to support further research.
In the realm of autonomous driving, accurately predicting the future trajectories of road agents is crucial for ensuring safety and efficiency. Traditional trajectory prediction methods primarily rely on past trajectories and high-definition (HD) maps. While these inputs provide valuable information, they often miss out on essential contextual cues such as the intentions of pedestrians, road conditions, and dynamic interactions between agents.
In our approach, we integrate visual and textual data to enhance trajectory prediction.
This comprehensive methodology leverages both visual and textual cues to significantly improve the accuracy and reliability of trajectory predictions in autonomous driving.
We demonstrate the effectiveness of VisionTrap by comparing trajectory predictions with and without the Visual Semantic Encoder and Text-driven Guidance Module. The examples below show how incorporating visual and textual data significantly improves prediction accuracy
The nuScenes-Text dataset enriches the nuScenes dataset with detailed annotations for every object in each frame, providing three versions of descriptions from surround camera views. We removed location-specific information such as 'left', 'right' or 'away from the ego car' to prevent confusion and refined descriptions using a LLM for clarity. These annotations capture diverse semantic details, such as agent behaviors, semantic features, and environmental conditions.
If you use our code or data, please cite:
@article{moon2024visiontrap,
title={VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions},
author={Moon, Seokha and Woo, Hyun and Park, Hongbeen and Jung, Haeji and Mahjourian, Reza and Chi, Hyung-gun and Lim, Hyerin and Kim, Sangpil and Kim, Jinkyu},
journal={arXiv preprint arXiv:2407.12345},
year={2024}
}