Skip to content

ALRhub/reward_surface_alr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

reward-surface-wrapper

A simple, extendable API for computing and visualizing filter-normalized reward surfaces for reinforcement learning agents.

This library provides a simple interface to generate visualizations of the loss/reward landscape around a trained agent's parameters, using the filter normalization technique described in Visualizing the Loss Landscape of Neural Nets.

Some of the code for applying filter normalization was taken from reward-surfaces. Their library is more feature-rich but more complex and has not been actively maintained for a while.

Features

  • Simple API: A straightforward abstract base class AgentWrapper that can be extended for any RL framework.
  • Stable Baselines 3 Integration: Comes with a pre-built SB3Wrapper for easy use with Stable Baselines 3 agents.
  • Configuration with Hydra: Manage configurations and hyperparameter sweeps using Hydra.
  • Weights & Biases Logging: SB3 Callbacks for logging surfaces and evaluation metrics to W&B.

Getting Started

Clone the GitHub repository reward-surface-wrapper.

git clone https://github.com/ALRhub/reward_surface_alr

Then, install the package to your environment:

pip install reward-surface-wrapper/

The core of the library is the AgentWrapper abstract class. To visualize reward surfaces for your agent, you need to implement three methods:

  1. get_weights(): Return the agent's model weights as a list of numpy-like arrays.
  2. initialize(weights): Create a new agent instance with a given set of weights.
  3. evaluate(): Run an evaluation loop and return a dictionary of metrics (e.g., mean reward).

Example with Stable Baselines 3

A wrapper for SB3 is already implemented. You can compute a reward surface in just a few lines:

from stable_baselines3 import PPO
from wrapper.agents.agent_sb import SB3Wrapper

# 1. Train or load your SB3 agent
model = PPO("MlpPolicy", "CartPole-v1").learn(10000)

# 2. Wrap the agent
agent_wrapper = SB3Wrapper(model)

# 3. Define evaluation parameters and compute the surface
eval_kwargs = {
    "eval_episodes": 20,
    "eval_steps": 500,
    "target": "reward",
    "aggregate": "mean",
}
offsets, surface_dict = agent_wrapper.compute_surface(
    eval_kwargs=eval_kwargs,
    grid_size=25,
)

# 4. Plot the surface
mean_reward_surface = surface_dict["Mean Reward"]
fig = agent_wrapper.plot_surface(offsets, mean_reward_surface, z_label="Mean Reward")
fig.show()

Demos

The demos/ folder contains a notebook to demonstrate how to use the wrapper to generate reward surfaces after training.

hydra_main.py implements a script for logging reward surfaces (and model checkpoints) during training, relying on SB3 callbacks. We use hydra MULTIRUN sweeps out of the box.

NOTE: we adapted the config structure from ALR Hydra Cluster Example.

python hydra_main.py --config-name=local

About

Reward Surface for visualizing RL rewards

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages