SPACeR: Self-Play Anchoring with Centralized Reference Models
Wei-Jer Chang, Akshay Rangesh, Kevin Joseph, Matthew Strong, Masayoshi Tomizuka, Yihan Hu, Wei Zhan
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
Developing autonomous vehicles (AVs) requires not only safety and efficiency, but also realistic, human-like behaviors that are socially aware and predictable. Achieving this requires sim agent policies that are human-like, fast, and scalable in multi-agent settings. Recent progress in imitation learning with large diffusion-based or tokenized models has shown that behaviors can be captured directly from human driving data, producing realistic policies. However, these models are computationally expensive, slow during inference, and struggle to adapt in reactive, closed-loop scenarios. In contrast, self-play reinforcement learning (RL) scales efficiently and naturally captures multi-agent interactions, but it often relies on heuristics and reward shaping, and the resulting policies can diverge from human norms. We propose SPACeR, a framework that leverages a pretrained tokenized autoregressive motion model as a centralized reference policy to guide decentralized self-play. The reference model provides likelihood rewards and KL divergence, anchoring policies to the human driving distribution while preserving RL scalability. Evaluated on the Waymo Sim Agents Challenge, our method achieves competitive performance with imitation-learned policies while being up to 10x faster at inference and 50x smaller in parameter size than large generative models. In addition, we demonstrate in closed-loop ego planning evaluation tasks that our sim agents can effectively measure planner quality with fast and scalable traffic simulation, establishing a new paradigm for testing autonomous driving policies.
关键词
相关论文
The Organization of Behavior
D. O. Hebb
2005
Fractional Brownian Motions, Fractional Noises and Applications
Benoît B. Mandelbrot, John W. Van Ness
1968
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi 等 10 位作者
2021
A guide to deep learning in healthcare
Andre Esteva, Alexandre Robicquet, Bharath Ramsundar 等 10 位作者
2018