How to Create New Environments¶
This guide explains how to create new environments for VAGEN using Blackjack as an example. Understanding the BaseEnv interface is key to building effective LLM training environments.
For the specific rules of Blackjack, please refer to Blackjack Rules. For the gym-formated Blackjack details, please refer to Blackjack-v0.
Directory Structure¶
vagen/env/blackjack/
├── env.py # BlackjackEnv - main environment wrapper
├── env_config.py # BlackjackEnvConfig - configuration class
├── prompt.py # Prompt templates and format configurations
├── blackjack.py # Core gym environment (standard gym interface)
└── __init__.py # Environment registration
File Responsibilities:
- blackjack.py(Optional): Your core game logic (usually a standard gym environment) (step, reset)
- env_config.py: Configuration parameters and settings for your environment
- env.py: VAGEN wrapper that bridges LLM responses to your game logic
- prompt.py: System prompts and LLM interaction format definitions
- __init__.py: Registration info to make your environment discoverable by VAGEN
Understanding BaseEnv Interface¶
VAGEN environments inherit from BaseEnv, which defines the contract between your game logic and the LLM training system. Here's what each required method does:
Core Methods Overview¶
step(llm_raw_response) - The heart of LLM interaction
- Takes the raw text response from the LLM (e.g., "<think>I should hit</think><answer>Hit</answer>")
- Parses it to extract valid actions (e.g., ["Hit"])
- Executes actions in your game
- Returns the next observation, reward, completion status, and metrics
reset(seed) - Initialize a new episode (these seeds are read from train/test parquet file)
- Resets the game to starting state
- Uses seed for reproducible episodes
- Returns initial observation for the LLM
system_prompt() - Define the LLM's role
- Returns the system prompt that tells the LLM what game it's playing
- Includes rules, available actions, and formatting instructions
close() - Clean up resources
- Called when the environment is no longer needed
compute_reward() - Optional final reward
- Usually returns 0.0 since step rewards are accumulated
- Use only if you need extra reward at episode end
Key Data Structures¶
Observation Format¶
Every observation must follow this structure:
{
'obs_str': "You see <image> showing your cards. The dealer shows <image>.",
'multi_modal_data': {
'<image>': [player_cards_image, dealer_card_image],
'<audio>': [shuffle_sound] # optional
}
}
<image> placeholders in obs_str must match the length of the image list.
Info Dictionary¶
Provides metrics and context for training:
{
"metrics": {
'success': bool, # Did LLM complete the task?
'action_is_effective': bool, # Did action change game state meaningfully?
'action_is_valid': bool, # Was action format correct?
},
"llm_raw_response": str, # Original LLM response
"llm_response": dict, # Parsed action structure
}
Implementation Components¶
Your environment needs three main files:
1. Environment Configuration (env_config.py)¶
Defines all parameters for your environment:
from vagen.env.base.base_env_config import BaseEnvConfig
from dataclasses import dataclass
@dataclass
class BlackjackEnvConfig(BaseEnvConfig):
env_name: str = "blackjack"
render_mode: str = "vision" # "text" or "vision"
natural: bool = False # Game-specific parameter
max_actions_per_step: int = 1 # For Blackjack, we only allow one action per step, since it needs to interact with the dealer
prompt_format: str = "free_think"
def config_id(self) -> str:
return f"BlackjackEnvConfig(mode={self.render_mode},format={self.prompt_format})"
2. Environment Implementation (env.py)¶
The main environment class implementing BaseEnv:
from vagen.env.base.base_env import BaseEnv
from .blackjack import BlackjackEnv as GymBlackjackEnv # Your gym environment
from .env_config import BlackjackEnvConfig
class BlackjackEnv(BaseEnv):
def __init__(self, config: BlackjackEnvConfig):
self.config = config
self.gym_env = GymBlackjackEnv() # Your underlying game
# Initialize parsers and prompt functions...
def step(self, llm_raw_response: str):
# 1. Parse LLM response to extract actions
parsed = self.parse_func(llm_raw_response)
actions = parsed['actions']
# 2. Execute actions in your game
reward = 0
done = False
if actions and actions[0] in self.ACTION_LOOKUP:
action_int = self.ACTION_LOOKUP[actions[0]]
_, reward, done, _ = self.gym_env.step(action_int)
# 3. Create metrics
metrics = {
"turn_metrics": {
"action_is_valid": len(actions) > 0,
"action_is_effective": reward != 0,
},
"traj_metrics": {
"success": done and reward > 0,
}
}
# 4. Generate next observation
obs = self._render()
info = {"metrics": metrics, "llm_raw_response": llm_raw_response}
return obs, reward, done, info
def reset(self, seed=None):
self.gym_env.reset(seed=seed)
return self._render(init_obs=True), {}
def system_prompt(self):
return "You are a Blackjack player. Actions: Hit, Stand. Goal: Beat dealer without busting."
def _render(self, init_obs=False):
# Generate observation based on render_mode
if self.config.render_mode == 'vision':
image = self.gym_env.render()
return {
'obs_str': "Current game state: <image>",
'multi_modal_data': {'<image>': [image]}
}
else:
text_desc = f"Your hand: {self.gym_env.player_sum}, Dealer: {self.gym_env.dealer_card}"
return {'obs_str': text_desc}
def close(self):
self.gym_env.close()
3. Prompt Templates (prompt.py)¶
Defines how LLM interacts with your environment:
def system_prompt(**kwargs):
return """You are a Blackjack player.
Goal: Get closer to 21 than dealer without going over.
Actions: "Hit" (take card), "Stand" (keep hand)"""
def init_observation_template(observation="", **kwargs):
return f"[Initial Hand]: {observation}\nDecide: Hit or Stand?"
# Format configurations for different reasoning types
FORMAT_CONFIGS = {
"free_think": {
"format": "<think>...</think><answer>...</answer>",
"example": "<think>I have 16, dealer shows 10. Risky but need to improve.</think><answer>Hit</answer>"
},
"grounding": {
"format": "<think><observation>...</observation><reasoning>...</reasoning></think><answer>...</answer>",
"example": "<think><observation>Hand: 16, dealer: 10</observation><reasoning>Tough spot, hitting gives chance</reasoning></think><answer>Hit</answer>"
}
}
def format_prompt_generator(format_type):
def prompt_function(**kwargs):
config = FORMAT_CONFIGS[format_type]
return f"Respond in format: {config['format']}\ne.g. {config['example']}"
return prompt_function
format_prompt = {fmt: format_prompt_generator(fmt) for fmt in FORMAT_CONFIGS}
Environment Registration¶
Register your environment in __init__.py:
from .env import BlackjackEnv
from .env_config import BlackjackEnvConfig
BLACKJACK_ENV_INFO = {
"env_cls": BlackjackEnv,
"config_cls": BlackjackEnvConfig,
"description": "Classic Blackjack card game"
}
# Add to vagen/env/__init__.py:
# REGISTERED_ENV["blackjack"] = BLACKJACK_ENV_INFO
Testing Your Environment¶
# Basic functionality test
config = BlackjackEnvConfig(render_mode="text")
env = BlackjackEnv(config)
# Test reset
obs, info = env.reset(seed=42)
print("Initial obs:", obs['obs_str'])
# Test step with LLM response
response = "<think>Let me be conservative</think><answer>Stand</answer>"
next_obs, reward, done, info = env.step(response)
print("Valid action:", info['metrics']['action_is_valid'])
print("Effective action:", info['metrics']['action_is_effective'])
print("Success:", info['metrics']['success'])
env.close()
Common Patterns¶
Multi-modal environments: Include images/audio in multi_modal_data
Text-only environments: Set multi_modal_data = None in observations
Multi-step episodes: Track progress in self.step_count or similar
Complex action spaces: Parse multiple actions from LLM response
Custom metrics: Add domain-specific metrics to the metrics dictionary
The key is understanding that step() bridges the gap between raw LLM text and your game logic, while maintaining consistent observation and metric formats for the training system.