Welcome to VAGEN Documentation!¶

Introduction¶

VAGEN is a multi-turn reinforcement learning framework designed for training Visual Language Model (VLM) agents efficiently.

Document Structure¶

Quick Strat¶

Installation and Run Experiment: Get VAGEN up and running

Configurations¶

General Configuration: Understanding VAGEN's configuration system
Algorithm Configuration: Configure different algorithms

Environments¶

Create your Own Environment: Build custom environments
Create your Own Service: Scale your training infrastructure

Comparison of Algorithms¶

Feature	PPO & GRPO	VAGEN-Base	VAGEN-Full
Sequence Structure	Single response	Multiple turn interaction	Multiple turn interaction
LM output	No special structure	`<think>...</think><ans>...</ans>`	`<think>...</think><ans>...</ans><eoa>`
Discounting	Single discount rate	Single discount rate	Bi-level discounting
Optimization	All tokens equally	All tokens equally	Selective token optimization

Citation¶

If you find VAGEN useful, we appreciate it if you could cite our work at:

@misc{wang2025vagen,
  title={Reinforcing Visual State Reasoning for Multi-Turn VLM Agents},
  author={Kangrui Wang* and Pingyue Zhang* and Zihan Wang* and Yaning Gao* and Linjie Li* and Qineng Wang and Hanyang Chen and Chi Wan and Yiping Lu and Zhengyuan Yang and Lijuan Wang and Ranjay Krishna and Jiajun Wu and Li Fei-Fei and Yejin Choi and Manling Li},
  year={2025},
  url={https://arxiv.org/abs/2510.16907}
}

License¶

Licensed under the MIT License.