2025 Research Wrap-Up
Seven research contributions from this fall spanning heterogeneous datasets, latent reasoning, constrained trajectory diffusion, and robust driving policies
As the year draws to a close, I wanted to write a short review of some of the exciting work from my students and collaborators released in the last three months.
1. Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data
Humanoid controllers, which need to track a specified motion on the robot hardware, typically face a tradeoff between agility and stability. While human motion capture (MoCap) data provides rich, agile behaviors to train on, it often lacks the physical compatibility with the robot body required for teaching extreme balance. Conversely, controllers designed for stability need lots of hand-engineered regularizers and are often too rigid for dynamic tasks.
We introduce AMS (Agility Meets Stability), the first framework to unify dynamic motion tracking and extreme balance in a single policy. Our key insight lies in leveraging heterogeneous data sources: human MoCap for agile behaviors and physically constrained synthetic balance motions for stability.
Key Result: A single policy demonstrates agile skills like dancing and running alongside extreme balance tasks like the “Ip Man’s Squat” on a real Unitree G1 humanoid.
✍️ Authors: Yixuan Pan*, Ruoyi Qiao*, Li Chen, Kashyap Chitta, Liang Pan, Haoguang Mai, Qingwen Bu, Cunyuan Zheng, Hao Zhao, Ping Luo, Hongyang Li
🔗 Links: Paper | Project Page
2. ReSim: Reliable World Simulation for Autonomous Driving
Just as AMS complements real with synthetic data, this paper picks up on the same recipe. ReSim uses a mix of real YouTube driving videos and synthetic data (from the CARLA simulator) to address a critical flaw in current world models.
World models, which simulate the future outcomes of a given policy, often struggle with hazardous or non-expert behaviors. Because these behaviors are rare in training sets, which are primarily composed of safe, error-free driving, the models cannot accurately predict what happens during and after a failure.
ReSim builds a reliable world model by enriching real-world human demonstrations with diverse non-expert data (e.g., collisions and off-road driving) from a simulator. We also introduce a Video2Reward module that estimates numerical rewards directly from ReSim’s simulated future.
Key Result: Both our world model and Video2Reward module show clear signs of “sim2real” transfer: situations seen only in simulation like collisions are correctly simulated and identified as low-reward behaviors, even when applied to real data. We show below how heterogeneous data is the key: both ReSim without the simulated training data and Vista, our previous model, cannot simulate collisions.
✍️ Authors: Jiazhi Yang, Kashyap Chitta, Shenyuan Gao, Long Chen, Yuqian Shao, Xiaosong Jia, Hongyang Li, Andreas Geiger, Xiangyu Yue, Li Chen
🔗 Links: Paper | Project Page
3. LCDrive: Latent Chain-of-Thought World Modeling for End-to-End Driving
Like ReSim, this next model, LCDrive, can generate an “imagined” future scene rollout. It then translates the observed outcome of the imagined rollout into a planned trajectory to drive along. It is much more efficient than ReSim: rather than operating in high-dimensional pixel space to generate a video, the rollout occurs entirely within an abstract latent space, akin to recent latent reasoning models.
Vision-Language-Action (VLA) models are among the mainstream approaches for autonomous driving today. Many use natural language for “Chain-of-Thought” (CoT) reasoning, but text is an inefficient representation for capturing the nuanced spatiotemporal dynamics of objects in a driving scene.
We present LCDrive, which expresses its CoT in a “latent language”. The model reasons by interleaving action-proposal tokens and world model tokens defined in a learned latent space. This allows the model to “imagine” possible outcomes before acting.
Key Result: LCDrive achieves faster inference and better trajectory quality compared to both non-reasoning and text-based reasoning baselines. In the example below, the text CoT reasoning model does not anticipate the possibility of a collision, which our latent CoT reasoning model correctly anticipates and avoids, despite being far more efficient (fewer tokens).
✍️ Authors: Shuhan Tan, Kashyap Chitta, Yuxiao Chen, Ran Tian, Yurong You, Yan Wang, Wenjie Luo, Yulong Cao, Philipp Krähenbühl, Marco Pavone, Boris Ivanovic
🔗 Links: Paper
4. VaVAM-ECO: Endpoint Constrained Trajectory Optimization for Driving Foundation Models
While LCDrive focuses on generating efficient latent rollouts, our work with VaVAM explores how to extract more robust driving behavior from latent world models. VaVAM is an amazing open-source driving model that learns to predict a latent representation of the future during training, but discards it at inference for efficiency. Instead, it uses flow matching to directly output a driving trajectory during inference.
Flow matching with limited compute often leads to “unstable” trajectories with suboptimal intermediate waypoints that hinder comfort and safety in closed-loop simulations.
We introduce Endpoint Constrained Optimization (ECO), a lightweight post-processing framework. It keeps the model-predicted endpoint fixed to leverage the learned semantic understanding, while using classical priors to refine intermediate waypoints for comfort and safety.
Key Result: VaVAM-ECO ranked 1st on the official HUGSIM leaderboard, winning the 2025 ICCV HUGSIM challenge! We show an example scene of this model driving in HUGSIM below.
✍️ Authors: Brayden Zhang, Mahsa Golchoubian, Igor Gilitschenski, Boris Ivanovic, Kashyap Chitta
🔗 Links: Paper | HUGSIM Benchmark
5. OMEGA: Optimization-Guided Diffusion for Interactive Scene Generation
Similar to VaVAM-ECO, OMEGA combines the flexibility of diffusion models with the rigor of classical optimization. However, instead of using optimization as a post-processing step, OMEGA uses it in alternating steps during the diffusion process to create interactive evaluation scenarios for driving policies.
Evaluating autonomous driving requires scenarios that are diverse yet physically plausible. Generative models alone often produce violations of physical constraints or lack precise behavioral controllability.
OMEGA is a training-free framework that enforces structural consistency by alternating between diffusion and optimization. It can be plugged in on top of an existing diffusion-based traffic simulator, allowing background agents to intelligently “challenge” the ego-vehicle in safety-critical scenarios.
Key Result: Given a dataset of regular driving, our approach can generate 5× more near-collision scenarios than encountered in the regular driving distribution, all while maintaining the overall scene realism. We show some examples below, with the most interactive vehicle highlighted in yellow.
✍️ Authors: Shihao Li, Naisheng Ye, Tianyu Li, Kashyap Chitta, Tuo An, Peng Su, Boyang Wang, Haiou Liu, Chen Lv, Hongyang Li
🔗 Links: Paper | Project Page
6. LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving
Interactive synthetic data (like that from OMEGA) is most valuable when accompanied by an “expert” policy to label the data with corresponding safe driving behaviors. However, we observe that existing experts in simulators like CARLA are often not ideal teachers for sensor-based students.
Standard experts suffer from Learner-Expert Asymmetry, relying on noise-free ground truth input data (e.g., precise actions of other vehicles) to make aggressive maneuvers with little safety margin that the student cannot reproduce when using its own limited sensors.
LEAD “de-privileges” the expert to align its inputs more with the student’s. By ensuring the expert’s demonstrations are “teachable” (e.g., slowing down in low visibility), we create a much more effective training signal for imitation learning.
Key Result: This is the most significant performance leap for all CARLA benchmarks in recent years, doubling the scores of the next best method on the most challenging settings.
✍️ Authors: Long Nguyen, Micha Fauth, Bernhard Jaeger, Daniel Dauner, Maximilian Igl, Andreas Geiger, Kashyap Chitta
🔗 Links: Paper | Project Page
7. Beyond Behavior Cloning in Autonomous Driving: a Survey of Closed-Loop Training Techniques
Despite significant progress on benchmarks, the results from LEAD reveal that current models struggle to recover once they drift into challenging, out-of-distribution states. This survey provides a high-level outline of the various ways the community is training better driving policies to overcome this limitation.
Behavior cloning, the most common way of training driving policies today, is prone to a wide variety of challenges commonly referred to as the “open-loop/closed-loop gap”.
This comprehensive survey provides a taxonomy of closed-loop training techniques that can overcome this gap. It explores three critical axes: action generation, environment response modeling, and the evolving training objectives that bridge imitation and reinforcement learning.
✍️ Authors: Peter Karkus*, Maximilian Igl*, Yuxiao Chen, Kashyap Chitta, Jef Packer, Bertrand Douillard, Ran Tian, Alexander Naumann, Guillermo Garcia-Cobo, Shuhan Tan, Alperen Degirmenci, Alexander Popov, Nikolai Smolyanskiy, Urs Muller, Boris Ivanovic, Marco Pavone
🔗 Links: Paper
Looking Ahead
As we move into 2026, I think research on these themes will continue to mature. Heterogeneous data is a powerful tool across many domains, and world models will become significantly more efficient, unlocking new applications in closed-loop training. I hope to be back soon with more updates on these topics, more technical deep dives, and a few exciting announcements!
Happy New Year!








7 papers in a year, so cool. Meanwhile I'm struggling to complete even 1 haha.
Great work man!
Hi,
I still remember when I was a child. There was a tiny shop near our house that sold old magazines and second-hand comics. Every Sunday, I would ride my small bicycle there, hoping to find something new. The shopkeeper knew I loved stories, so he always saved the most interesting comic for me. I used to sit outside the shop on the steps, completely lost in those colorful pages forgetting about time, homework, everything.Reading your story today reminded me of that same excitement I felt back then. What inspires you the most when you create your stories?