23 Apr 00:11

eddb418

SkyRL: v0.2.0 Latest

Latest

Highlights

VLM Support: SkyRL now supports VLM training, both through the Tinker API as well as via the python entrypoint. We've validated stable training for single and multi-turn datasets with both text-only environments and multi-modal environment outputs. Get started here: https://docs.skyrl.ai/docs/tutorials/vision_language_rl

New inference refactor centralizing on HTTP: We've implemented a new HTTP-based refactor (*in inference_servers) for inference with vLLM. This standardizes all inference interactions over HTTP, further integrating vllm-router as a high-performance router for generation requests. We also support prefill-decode disaggregation, allows users to squeeze out more performance in multi-turn async RL use-cases. The new inference codepath is now the default - to use the legacy (inference_engines/), set _SKYRL_USE_NEW_INFERENCE=0. The inference_engines/ codepath will be removed in the next release.

vLLM Native weight syncing API integration: SkyRL's new inference servers implementation uses vLLM's native weight syncing APIs: https://docs.vllm.ai/en/latest/training/weight_transfer/

Step-wise training improvements: We've made a number of fixes to the stepwise training implementation, addressing correctness issues (#1492), implementing support for fully async training (#1536) and adding prefix-aware merging to avoid redundant forward passes (#1532)

R3 support: SkyRL now supports R3 for stabilizing training with MoE models. Currently this is limited to cases where vllm engine is within a node due to limitations from vLLM.

Nemotron 3 and Qwen 3.5 support: SkyRL now supports Nemotron 3 and Qwen 3.5 models. Nemotron 3 is supported in the FSDP and Megatron backends, while Qwen 3.5 is supported in FSDP, Megatron and Jax backends.

What's Changed

[trivial] Fix comments numbering and make some code more concise in trainer.py by @CharlieFRuan in #1283
[train][1/N] Native Weight Sync API: NCCL by @hao-aaron in #1271
[ci] Fix CI from broken test imports from #1271 by @erictang000 in #1290
[fix] Fix cuda ipc weight sync after #1271 by @erictang000 in #1292
fix paths for instruction comments to match current location by @linde in #1294
[CI] Skip FlashRL integration test in CI and fix failing generation test for new inference codepath by @SumanthRH in #1301
Skip output_router_logits for granitemoehybrid models by @eltonjohnfanboy in #1295
WIP: Restore PR changes lost during skyrl-train deprecation by @tyler-griggs in #1310
[chore] Update skyrl and skyrl-gym versions after 0.1.0 release by @SumanthRH in #1312
[lint] Add isort to pre-commit by @SumanthRH in #1267
[examples][bug] fix silent eval max generate length not overriding by @erictang000 in #1317
[docs] Add explicit eval_sampling_params.max_generate_length by @SumanthRH in #1318
[vllm] enable mp distributed executor backend (no multi-node engines) by @erictang000 in #1300
[train][2/N] Native Weight Syncing APIs: IPC by @hao-aaron in #1291
[algorithm][generator] change overlong filtering to use stop reasons over checking eos token by @erictang000 in #1319
Add rollout_is policy loss by @SamuelGabriel in #1314
[AsyncRL] Use keep mode for pause and resume by @hao-aaron in #1179
[skyrl][inference] Fix port collision when ports are allocated. by @nithinvc in #1302
R3 PR: Rollout Routing Replay by @erictang000 in #1273
[megatron] enable bucketed weight sync for non-colocated nccl weight sync in megatron by @erictang000 in #1324
[fix] Fix placement group bundle ordering for inference engines by @SumanthRH in #1308
[train][fix] Fix concurrency limitations in the new inference codepath by @SumanthRH in #1320
[megatron][lora] Fix megatron lora weight syncing not initializing buckets correctly by @erictang000 in #1330
[train] Add worker_process_setup_hook to set mp start method to spawn by @SumanthRH in #1333
[CI] Fix test_inference_engines_generation after vllm 0.16.0 upgrade; Use the correct GSM8k path for test_generator_multi_turn_gsm8k_router_replay by @SumanthRH in #1339
[train] Make TrainingInputBatch to PAD only to left, hence response tensors be right-aligned by @CharlieFRuan in #1285
Revert "[train] Add worker_process_setup_hook to set mp start method to spawn" by @SumanthRH in #1344
[Docs] Add docs on agent integration and step-wise training by @CharlieFRuan in #1347
[Docs] Small update on docs by @CharlieFRuan in #1348
[train] Add validation for step-wise GeneratorOutput by @CharlieFRuan in #1281
[megatron] rebuild weight conversion tasks per sync to prevent stale PP-collective caches with bucketing by @erictang000 in #1345
[StepWise] Trivial fix to avg_response_length metric by @CharlieFRuan in #1351
[CI] Make MultiItemDataset a global variable after switch to spawn by @SumanthRH in #1346
[train] Add support for LoRA in the new inference codepath by @SumanthRH in #1329
[bug][algorithm] remove incorrect torch.no_grad() for kl in loss (use_kl_loss=True) by @erictang000 in #1353
[transformers] set return dict false for transformers v5 compatibility by @erictang000 in #1325
[skyrl][tx] Move ModelInput token extraction to backends by @nithinvc in #1352
[tx] Fuse the projection matrices for Qwen3 by @pcmoritz in #1341
[tx] Fuse the projection matrices for Qwen 3.5 by @pcmoritz in #1362
Add CodeScout project to README by @CharlieFRuan in #1364
[train] Patch vLLM v0.16.0 sleep mode to properly free model weights by @CharlieFRuan in #1365
[tx] Optimize the decode performance by @pcmoritz in #1363
[skyrl] Add ImageChunk and ImageAssetPointerChunk types by @nithinvc in #1361
[train] Enable support for the mp backend with the new inference codepath by @SumanthRH in #1355
Fix loss_fn_outputs right-aligned slicing in Tinker API path by @CharlieFRuan in #1367
[bug] Move server creation and server start in the same thread by @hao-aaron in #1375
[router replay] downcast expert router indices to uint8/int16 to reduce space by @erictang000 in #1378
[train] Fix double-serialization of TensorBatch in pickle by @erictang000 in #1379
Bump vLLM to 0.18 by @hao-aaron in #1374
[train] Use a shared semaphore for all generate requests with RemoteInferenceClient; Move tokenization to client by @SumanthRH in #1381
[async] Add search r1 fully async script by @CharlieFRuan in #1386
[train] Add vLLMRouter in new inference codepath by @SumanthRH in #1385
[trainer] refactor dispatch_from_staged to individually serialize DP chunks to avoid materializing whole batch on all workers by @erictang000 in #1376
[SkyRL] Introduce /render endpoint to the new http inference client by @nithinvc in #1373
[async] Add DAPO fully async script by @CharlieFRuan in #1390
[examples] update command paths to include train/ in examples by @erictang000 in #1395
[bug] fix sleep bugs by @hao-aaron in #1383
[train][2/N] Support for Megatron PP + CP for R3 by @devpatelio in #1335
[train] Multi-modal inputs support in FSDP2 by @nithinvc in #1331
[bug] Fix weight sync with DP > 1 in non-colocated setups by @SumanthRH in #1399
[train] Make vLLMRouter...

Contributors

linde, pcmoritz, and 13 other contributors

Assets 2

11 Mar 05:04

SumanthRH

skyrl-v0.1.0

4bc076f

SkyRL: v0.1.0

Highlights

New unified package : This is the first release of the unified skyrl package combining the skyrl-train and skyrl-tx packages. The unified package brings together the FSDP, Megatron and Jax backend under the Tinker API, while still retaining user-facing "frontend" interfaces (Ex: BasePPOTrainer, SkyRLGymGenerator, etc) from the skyrl-train and skyrl-tx packages. For details about the migration, please refer to #1145.

Improved API documentation: We've revamped the API documentation pages for SkyRL The new API documentation pages can be found here: https://docs.skyrl.ai/api-ref

Pythonic Configs: The skyrl-train backend has now fully migrated to using pythonic dataclasses, replacing the older YAML based interface. The configuration hierarchy has also been updated, and the CLI no longer relies on Hydra. Please refer to the documentation for the new configuration hierarchy: https://docs.skyrl.ai/docs/api-ref/skyrl/config

SGLang is no longer supported: SkyRL no longer supports the SGLang inference engine, unifying on vLLM.

vllm 0.16.0 upgrade: This release updates vLLM to 0.16.0

Qwen 3.5 experimental support: SkyRL now has experimental support for Qwen 3.5 models. This is currently limited to the Jax backend.

What's Changed

[tx] Load test weights from HF cache instead of save_pretrained by @raulchen in #1095
[tx] Fix stacked sharding for flax >= 0.12.4 by @pcmoritz in #1120
[tx] Stack weights — Llama3 by @raulchen in #1081
[Harbor] Rename TerminalBenchGenerator -> HarborGenerator by @CharlieFRuan in #1122
[tx] Stack weights — DeepSeek by @raulchen in #1082
[Harbor][Docs] Add Harbor docs and make data prepartion streamlined by @CharlieFRuan in #1124
[tx] Per-layer gradient checkpointing by @raulchen in #1083
[tx] Make apply_lora work with flattened states by @tanmaysachan in #1111
Port #1079 to skyrl folder by @pcmoritz in #1127
Fix linting on main branch by @pcmoritz in #1128
Port #1095 to skyrl folder by @pcmoritz in #1129
Port #1120 to skyrl folder by @pcmoritz in #1130
[migration] copy old docs, examples, integrations, scripts by @erictang000 in #1133
[1/N] Remove skyrl folder on main by @erictang000 in #1138
[2/N] Add back skyrl code to top level and delete skyrl-train and skyrl-tx to get git history by @erictang000 in #1139
[3/N] Add back skyrl-train and skyrl-tx code by @erictang000 in #1140
[4/N] Fix CI paths for new skyrl code now that we have removed one level of nesting by @erictang000 in #1141
fix linting error in skyrl-cpu CI (ignore agent and examples) and change CI scope by @erictang000 in #1143
Update README with repo re-organization notice by @CharlieFRuan in #1146
Update README about reorganizing by @CharlieFRuan in #1147
[Reorg] Update examples/train scripts to use new skyrl package by @CharlieFRuan in #1148
[Harbor] Make the data preparation scripts simpler and use examples/train as main entrypoint by @CharlieFRuan in #1150
[Harbor] Move examples/train/harbor to examples/train_integrations/harbor by @CharlieFRuan in #1151
Redirect to new Harbor integration documentation by @CharlieFRuan in #1152
[train][logs] Separate infrastructure logs from training progress by @CharlieFRuan in #1088
[train][Config] Make trainer.algorithm.max_seq_len configurable, for seq_mean_token_sum_norm by @CharlieFRuan in #1153
[Harbor] Add overlong filtering support and add Dr. GRPO params to script by @CharlieFRuan in #1157
[Harbor] Expose configs override_timeout_sec and env-specific auto stop by @CharlieFRuan in #1158
[tx] Pass through the loss function config by @pcmoritz in #1155
Port #1081 to skyrl folder by @pcmoritz in #1131
Port #1082 to skyrl folder by @pcmoritz in #1161
Port #1083 to skyrl folder by @pcmoritz in #1162
Port #1111 to skyrl folder by @pcmoritz in #1164
[tx] Fix loss function config keys and add validation by @pcmoritz in #1159
Add CISPO loss function to skyrl-tx by @tamoghnokandar in #1144
[Harbor] Move rate limit out of TrialConfig to +generator.rate_limit by @CharlieFRuan in #1165
[Harbor][Doc] Update Harbor docs by @CharlieFRuan in #1163
[Docs] Small fix on doc linking by @CharlieFRuan in #1166
Port #1155 to skyrl folder by @pcmoritz in #1167
Port #1159 to skyrl folder by @pcmoritz in #1168
Port #1144 to skyrl folder by @pcmoritz in #1169
Update readme for blogposts by @CharlieFRuan in #1171
Finish skyrl-tx migration to skyrl folder by @pcmoritz in #1170
[tx] Add metrics to optim_step by @pcmoritz in #1142
[train] Support logprobs, fix generation config defaults and add more generation tests for the new HTTP inference pathway by @SumanthRH in #1038
[Harbor] Bump harbor version by @CharlieFRuan in #1174
[ci][megatron] fix megatron CI by including numpy in override-dependencies by @erictang000 in #1175
[ci] Migrate nightly e2e CI tests to run against skyrl directory by @erictang000 in #1176
[deps] remove numpy from override dependencies by @erictang000 in #1178
[docs] Add API reference to new docs site and cleanup old docs by @SumanthRH in #1177
[CI][Megatron] Use ray_init_fixture in test_megatron_policy_weight_sync by @SumanthRH in #1183
[Docs] Fix Runpod example to use the new skyrl package by @SumanthRH in #1185
[tests] Fix template path in test_inference_engine_http_endpoint by @SumanthRH in #1186
[tests] Fix remote server startup command and sleep level for engine generation tests by @SumanthRH in #1190
[tx] Introduce optimization step metrics dataclass by @pcmoritz in #1191
[Docs] Fix API reference docs for SkyRLTrain backend by @SumanthRH in #1192
[Harbor] Replace asyncio.gather with TaskGroup for proper cancellation by @CharlieFRuan in #1193
[tx] General implementation of trainable Hyper Connections by @tanmaysachan in #1008
Port #1191 to skyrl folder by @pcmoritz in #1197
[train] Add explicit finish calls for tracker when training ends by @SumanthRH in #1198
[train][CI] Add regression thresholds for E2E CI runs by @SumanthRH in #1199
[train][tests] Fix hanging test test_abort_generation_vllm_engine by @SumanthRH in #1202
[tests][train] Fix repeated arg in test_lora and cleanup logic in test_skyrl_gym_generator by @SumanthRH in #1209
[train] Pythonic Configs 2/N: Switch to new dataclasses in entrypoint scripts; Change instantiation signatures by @SumanthRH in #1187
[tx] Port #1008 to skyrl folder by @pcmoritz in #1217
[tx] Update README.md by @pcmoritz in #1223
[tx] Fix checkpoint loading for MoE models by @pcmoritz in #1224
[Docs] Migrate to Griffe2md for fumadocs-compatible API reference by @SumanthRH in https://github.com/NovaSky-AI...

Contributors

pcmoritz, raulchen, and 8 other contributors

Assets 2

13 Feb 17:48

tyler-griggs

skyrl_train-v0.4.0

332c7cb

SkyRL-Train: v0.4.0

Highlights

Tinker API Integration: SkyRL now fully implements the Tinker API, a simple training and sampling API introduced by Thinking Machines Lab. Any training script written against the Tinker API can run locally on your own GPUs using SkyRL's backends with zero code changes. See the Tinker API docs to get started.

Supported Tinker features include:

Supervised fine-tuning (cross_entropy loss) and RL training (importance_sampling loss)
LoRA and full-parameter fine-tuning
Sampling with logprobs via colocated vLLM inference engines
FSDP2 and Megatron training backends
Lazy inference engine initialization for SFT-only workloads
Ephemeral and persistent weight sync modes

Repo Reorganization: The skyrl-tx and skyrl-train packages are being unified into a single skyrl/ folder. The existing packages remain fully functional and will be migrated to new paths shortly.

Megatron Backend for Tinker: The Megatron strategy is now fully supported for Tinker workloads, including RL training with loss_fn_outputs passthrough.

HTTP Inference Integration: A new HTTP-based inference server integration (feature-flagged) enables decoupled inference engine deployments.

Pythonic Configs: Introduced configuration dataclasses as an alternative to YAML-only configuration, with migration of tests to the new system.

Off-Policy Correction Refactor: Refactored truncated importance sampling (TIS) into a more comprehensive off-policy correction config with support for token-level and sequence-level ratio types.

Harbor Integration: Upstream Harbor integration for evaluation, with Modal support and configurable rate limiting.

Documentation: Migrated documentation to fumadocs, with comprehensive Tinker API docs including quickstart, architecture, cookbook scripts, and configuration pages.

New Model Support (TX):

DeepSeekV3 implementation with expert parallelism
GLM-4.7 Flash support
Qwen3 stacked weights optimization

What's Changed

[tx] Add experimental SkyRL-train backend that supports SFT by @pcmoritz in #871
Add sampling support for Tinker SkyRL backend by @pcmoritz in #999
Add checkpointing support for Tinker SkyRL backend by @pcmoritz in #992
Unify Megatron and FSDP training interfaces with forward_backward + optim_step by @pcmoritz in #901
Implement forward-only pass and populate metrics by @tyler-griggs in #1046
Emit loss_fn_outputs with logprobs for RL losses in forward_backward by @tyler-griggs in #1047
[tx] Lazy inference engine initialization by @tyler-griggs in #1069
Support colocate_all=False in Tinker backend by @tyler-griggs in #1097
[skyrl-train] Return loss_fn_outputs for megatron backend to support tinker RL by @erictang000 in #1102
[tx][megatron] making megatron skyrl-train worker usable as TX backend by @erictang000 in #1067
[tx][train][merge] make the skyrl folder standalone by @erictang000 in #1084
[WIP][skyrl] Create new skyrl folder combining tx + train by @erictang000 in #1068
[skyrl-train] Add SFT support via forward_backward(loss_fn="cross_entropy") by @pcmoritz in #961
Add set_lr() for dynamic learning rate updates from Tinker by @pcmoritz in #978
Fix placement group creation in SkyRL-Train backend by @pcmoritz in #1010
[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N by @CharlieFRuan in #931
[skyrl-train][inference] Inference Server Refactor (1/N) by @CharlieFRuan in #899
[skyrl-train][refactor] Inference Server Refactor -- RemoteInferenceClient 2/N by @CharlieFRuan in #904
[train] Pythonic Configs 1/N - Introduce configuration dataclasses by @CharlieFRuan in #1001
[skyrl-train] Refactor TIS to use more comprehensive off policy correction config by @erictang000 in #849
[train][Harbor][1/N] Upstream Harbor integration by @CharlieFRuan in #923
[Harbor] Add Modal support and bump Harbor version by @CharlieFRuan in #1022
[Harbor] Add rate limit for trials/sec and max concurrency by @CharlieFRuan in #1074
[tx] DeepseekV3 implementation by @pcmoritz in #889
[tx] Add support for GLM-4.7 Flash by @pcmoritz in #1023
[tx] Stack weights — Qwen3 by @pcmoritz in #1079
[tx] Add EP axis to deepseek by @pcmoritz in #993
[tx] chunked logprobs computation for memory efficiency by @pcmoritz in #902
[skyrl-train] Add example for 235B LoRA training with Megatron on 4 H100 nodes by @erictang000 in #1000
[train] Enable RayPrometheusStatLogger for async vLLM engine by @CharlieFRuan in #900
[train][OpenAI] Add generator.served_model_name for /chat/completions by @CharlieFRuan in #970
[train] Enable custom chat template for get_response_ids_and_loss_mask_from_messages by @CharlieFRuan in #981
[train][vllm] Add enable_log_requests and max_log_len support by @tyler-griggs in #1071
[tx] Use WAL mode for sqlite by @pcmoritz in #1054
Increase busy timeout for sqlite to avoid database is locked error by @pcmoritz in #1105
[tx] Gracefully handle stale save_weights_for_sampler requests on engine restart by @pcmoritz in #1073
Migrate documentation to fumadocs by @tyler-griggs in #941
Add Tinker integration documentation by @tyler-griggs in #1050
[agent] Add YouCom search engine by @caoshiyi in #803

Full Changelog: skyrl_train-v0.3.0...skyrl_train-v0.4.0

Contributors

pcmoritz, caoshiyi, and 3 other contributors

Assets 2

03 Dec 17:07

SumanthRH

skyrl_train-v0.3.0

b2a08a0

SkyRL-Train: v0.3.0

Highlights

Asynchronous training: We now support fully asynchronous training in SkyRL, enabling higher throughput for agentic RL: https://skyrl.readthedocs.io/en/latest/tutorials/fully_async.html

Dependency Upgrades:

Upgraded vLLM to 0.11.0, Ray to 2.51.1
Megatron: Migrated from mbridge to the newer Megatron-Bridge library. The latter is expected to have more active development and support from NVIDIA.

The updated installation instructions can be found here.

Recipes: We've consolidated a list of end-to-end recipes with SkyRL here for reference runs on math, Text2SQL and search tasks.

SkyRL on Managed Platforms: Guides for running SkyRL on managed platforms such as Anyscale, Runpod and SkyPilot can be found here.

Miscellaneous: Support for GPT-OSS, integration with Pytorch's OpenEnv, support for IPv6 clusters, and more!

What's Changed

[Examples][Step wise] Support thinking models like Qwen 3 by @SumanthRH in #468
Modal Integration by @benji-cannot-code in #444
[fix] abort all requests before sleep by @vutrung96 in #458
TerminalBenchGenerator: logprobs + session ID by @li-boxuan in #448
Divide-by-Zero when setting NUMA affinity patch by @matthambrecht in #457
[bug] run linter for t-bench generator by @erictang000 in #476
Bump vLLM version to 0.11.0 by @tyler-griggs in #481
[Sequence parallel][train] Support sequence parallelism without sample packing by @SumanthRH in #480
[fix] Resolve timeout and cleanup issues in GPU CI pipeline by @tyler-griggs in #483
Increase timeout for GPU CI by @tyler-griggs in #485
Skypilot: Update Doc by @lynnliu030 in #484
Fix GPU CI Test Failures: Migrating Tests, NCCL P2P Access Errors, and Test Fixture Issues by @devpatelio in #477
[Fix] Fix entropy calculation without sample packing by @SumanthRH in #490
Skypilot: Multi-Node Test by @lynnliu030 in #493
Support exporting environment-specific metrics by @vibha-ctrl in #386
Fix broken import by @tyler-griggs in #500
Revert "Bump vLLM version to 0.11.0" by @erictang000 in #501
Fix broken entropy metric by @tyler-griggs in #504
[fix] Resolve double ray.init() call by @tyler-griggs in #506
[lora] fix lora with vllm offline engine by @erictang000 in #513
Increase GPU CI Timeout to Pass All Tests by @devpatelio in #512
[train] Increase default timeout for placement groups to 180s by @SumanthRH in #525
[dependencies] fix some flash-rl dependency issues by @erictang000 in #530
Add implementation of CISPO loss by @vutrung96 in #523
[skyrl-train] assert that the policy loss type is regular/dual clip for tis by @erictang000 in #546
[Fix] Fix fsdp2_load_state_dict with HSDP by @SumanthRH in #554
[skyrl-train] update defaults for CISPO by @erictang000 in #553
[GPTOSS] Integrate Unsloth's flex attention implementation for attention sink by @SumanthRH in #515
[skyrl-train][logging] rename loss/avg_raw_rewards to loss/avg_final_rewards for clarity by @erictang000 in #544
[Integrations] Support PyTorch OpenEnv by @lynnliu030 in #543
[Docs] Fix image in OpenEnv doc by @SumanthRH in #562
Remove truncation logic, fix corresponding tests by @devpatelio in #508
[megatron][bug fix] reset dist checkpointing asynccallsqueue to allow freeing memory by @erictang000 in #565
[dependencies] separate vllm + megatron + bump vllm back to 0.11.0 + pin minimum uv version for extra-build-dependencies by @erictang000 in #528
[skyrl-train] Enable Inference Engine pipeline parallelism by @pandyamarut in #555
[fix] Broken method call in test by @tyler-griggs in #571
[AsyncRL][1/N] Add abort_generation to vllm engine and pause/continue generation to client by @CharlieFRuan in #537
Update README.md about SkyRL-v0 reproduction by @caoshiyi in #573
[AsyncRL][2/N] Implement /chat/completion with retry on aborted sub requests by @CharlieFRuan in #557
[train][Logging] Set loguru default to INFO, and customizable by LOG_LEVEL by @CharlieFRuan in #578
[skyrl-train][Fix] Fix epoch counter after resuming from checkpoint by @SumanthRH in #589
[skyrl-train] Enforce eager by default by @SumanthRH in #569
[skyrl-train][Fix] sleep only if colocated by @SumanthRH in #595
Fix: Megatron Autograd Warning for Broadcast Kernel by @devpatelio in #588
Comment by @devpatelio in #596
Comment upda by @devpatelio in #597
Cleanup stray doc by @SumanthRH in #599
[skyrl-train] Make libnuma optional for training by @SumanthRH in #601
[skyrl-train][Examples] Support truncated importance sampling for StepWiseGenerator by @SumanthRH in #570
Add YaRN support for VLLM and HF by @sergeypastukhov-ddog in #561
[Docs] Refactor documentation for running SkyRL on managed platforms by @SumanthRH in #608
[train] Remove train_batch_size from fsdp/deepspeed strategy by @CharlieFRuan in #617
[skyrl-train] add option to specify ref model path by @erictang000 in #623
[skyrl-train] Add DAPO 7B recipe, and 32B training script by @erictang000 in #532
[skyrl-train][recipes] add dapo qwen3 1.7b and 4b scripts by @erictang000 in #625
Fix table formatting in DAPO README by @erictang000 in #631
[train][utils] Aggregate rollout metrics and validate output in concat GeneratorOutput by @CharlieFRuan in #620
[skyrl-train] Add example for on-policy distillation by @erictang000 in #585
Support IPv6 addresses in TCP URL construction by @mayavkrishnan25 in #612
[train][TBench] Cherrypick Terminus integration and use Harbor by @CharlieFRuan in #637
[megatron] Added non cuda ipc wt sync to megatron workers by @nikhilbarhate99 in #635
[docs] Add build instructions to README.md by @CharlieFRuan in #648
Fix in README.md by @nrghosh in #653
[skyrl-train][Fix] Fix FSDP1 module wrap policy for HFModelWrapper by @SumanthRH in #654
Return init_prompts in generate_batched by @ebronstein in #652
[Docs] Fix model placement docs by @SumanthRH in #663
[skyrl-train] Support older vllm versions till 0.9.2 by @SumanthRH in #671
[lora] enforce_eager=true slows down generation time dramatically with LoRA by @devpatelio in #665
Conditionally add the generation prompt to the multi-turn chat template by @ebronstein in #676
Add entropy loss by @pbokc in #622
[skyrl-train] Upgrade Ray to 2.51.1 by @SumanthRH in #633
[Docs] Add a recipes page consolidating all E2E recipes by @SumanthRH in #679
[skyrl-train][docs] Add commit for dapo to recipes and add megatron search-r1 results by @erictang000 in #689
[megatron] upgrade from mbridge -> Megatron-Bridge (breaking change) by @erictang000 in #453
[update] Updated RoPE Configuration for HF Model...

Contributors

hezyin, vutrung96, and 21 other contributors

Assets 2

13 Oct 18:12

tyler-griggs

skyrl_train-v0.2.0

1ed499c

SkyRL-Train: v0.2.0

Highlights

This release contains 163 commits from 22 contributors, including 11 new contributors!

Megatron Backend: SkyRL now has full support for the Megatron training backend with 5D parallelism and strong support for large-scale MoE training. Learn more in our Megatron guide and examples.

LoRA Support: SkyRL now supports LoRA training with the FSDP backend and vLLM inference engine. Learn more in our LoRA guide and examples. We will continue aggressively improving LoRA support and performance, tracked in #449.

OpenAI API Compatibility: SkyRL has standardized around the OpenAI API for inference. This means that agents and agent scaffolds can call into the inference engine over the OpenAI API. SkyRL manages the inference engines and will provide a base_url to hit an OpenAI API compatible endpoint.

Integrations: Building on top of our standardization on OpenAI APIs, we integrated several popular environment and agentic projects. A couple highlights include:

Prime Intellect's Environments Hub: A guide and examples can be found here.
mini-swe-agent: A popular, minimal SWE agent implementation. A guide and examples can be found here.

What's Changed

[Doc] LLM-judge by @lynnliu030 in #167
[Gym] Add AIME by @lynnliu030 in #148
[DAPO] Fix data preprocess script by @lynnliu030 in #172
[FlashRL 3/N] Add example for FP8 training with FlashRL by @SumanthRH in #169
get chat_end_index after env init by @etnbrd in #184
[FlashRL N/N] Support Int-8 Rollouts by @SumanthRH in #176
[Generator] Support token-in-token-out rollout by @CharlieFRuan in #152
[Fix] Set skip_special_tokens to True in default sampling params by @pgasawa in #192
[docs] Update libnuma installation instructions by @tyler-griggs in #191
[docs][trivial] Add libnuma error in installation page by @CharlieFRuan in #196
[cleanup] Make agent_loop output a dataclass by @tyler-griggs in #194
[docs] Simplify installation appearance by @tyler-griggs in #200
1/N GPU CI Migration by @tyler-griggs in #195
Increase GPU CI timeout to 1hr by @tyler-griggs in #205
Clarify ObsType and the observation number by @etnbrd in #207
[fix] Skip special tokens for sglang when decoding token ids by @CharlieFRuan in #210
[doc fix] Correcting ObsType in BaseTextEnv by @tyler-griggs in #209
[Doc] Add a doc page for SkyRLGymGenerator, multi-turn rollout/tokenization by @CharlieFRuan in #186
[tests] Fix gpu offload test by @erictang000 in #215
[Fix][Generator] Correct chat history length for retoknenize codepath with env.init by @CharlieFRuan in #214
[Generator][Env] Add stop str, remove need for post-processed action in search and txt2sql by @CharlieFRuan in #190
[Examples] Minor fix after #190 by @SumanthRH in #216
[InfEngines] Lift tokenization up to InferenceEngineClient by @tyler-griggs in #217
[logs] Change print warnings to logger.warning by @CharlieFRuan in #219
[2/N GPU CI Migration] Fix broken tests by @tyler-griggs in #220
Bump vllm to 0.10.1.1 by @tyler-griggs in #225
[Fix][CI] Fix GPU CI where search needs stop param by @CharlieFRuan in #232
[Generator][HTTP] Add OpenAI API inference HTTP server for generator, support /chat/completions by @CharlieFRuan in #230
[cleanup] Remove redundant sampling params code path by @tyler-griggs in #234
[Fix][Generator] Use custom_chat_template in each step retokenization by @CharlieFRuan in #233
[Fix] Fix failing sglang test after #234 by @SumanthRH in #237
[Generator] Run env methods in threadpool executor by @alex-dr in #240
[datasets] Support hugginface datasets by @tyler-griggs in #235
[Generator] Support turn-level rewards in SkyRLGymGenerator by @tyler-griggs in #226
[fix] finish mlflow run by @etnbrd in #243
[trainer] Initial Megatron TP + PP Support by @erictang000 in #223
1/N Terminal Bench Integration by @tyler-griggs in #239
[HTTP][Generator] Let vllm python engine handle OAI request, remove openai_api_protocol by @CharlieFRuan in #238
[fix] Add missing required argument to OpenAIServingChat by @tyler-griggs in #247
Cloud Checkpointing by @tyler-griggs in #248
[fix] Extra argument in cleanup_old_checkpoints by @tyler-griggs in #254
[fix] Broken call to io.exists by @tyler-griggs in #255
Environments Hub integration by @tyler-griggs in #241
[fix] Resolve CI error due to import error by @tyler-griggs in #256
[fix] Bring back pretty log formatting by @tyler-griggs in #250
Update verifiers readme by @tyler-griggs in #258
Update README.md by @tyler-griggs in #259
Revert "[fix] Bring back pretty log formatting" by @CharlieFRuan in #261
[CI][Fix] Fix sglang import error by correcting pytest -m condition by @CharlieFRuan in #262
[Algorithms] Add Clip-Cov and KL-Cov loss functions by @SumanthRH in #251
[trainer][megatron] Fix for DP slower convergence by @erictang000 in #245
[GPU CI] Skip integration tests if uv add fails by @SumanthRH in #264
[uv] Ignore .venv by @SumanthRH in #263
[trainer][megatron] Override fused attention with flash-attn when flash_attn=True for Megatron by @erictang000 in #265
[Fix][CI] Address generator CI test fails when model stop reason is length by @CharlieFRuan in #269
[tiny fix] Remove the example launch command for main files by @tyler-griggs in #268
[InfEngineClient] Extract out routing logic to a helper by @CharlieFRuan in #267
Reuse mlflow if exists by @sergeypastukhov-ddog in #257
[trainer][megatron] Sequence packing + Context Parallel for Megatron by @erictang000 in #274
[trainer][megatron] make megatron config directly accessibile through trainer field by @erictang000 in #275
[Examples] Add an example for training on SWEBench task with Mini-SWE-Agent by @SumanthRH in #222
[Examples][Fix] Minor cleanup for the Mini-SWE-Agent example by @SumanthRH in #281
Remove simplecoder example by @pcmoritz in #282
[Fix] Fix pass_at_k missing for SkRLGymGenerator.agent_loop flow due to token-level rewards by @CharlieFRuan in #271
Add config validation for train batch size vs dp size by @tyler-griggs in #229
Add configurable timeout for placement groups by @SumanthRH in #276
[Feature] Eval Only Entry Point by @benji-cannot-code in #171
[fix] Fix the pretty trainer logging by @tyler-griggs in #270
[Examples] Add a README for the Mini-SWE-Agent example by @SumanthRH in #287
[MoE] Add support for vLLM inference expert parallelism by @tyler-griggs in #159
[Generator][HTTP] Support /completion endpoint for token-in-token-out by @CharlieFRuan in #260
Wandb same-step logging by @tyler-griggs in #289
[GPU CI][E2E] Increase timeout for ...

Contributors

pcmoritz, etnbrd, and 18 other contributors

Assets 2

20 Aug 01:20

SumanthRH

skyrl_train-v0.1.0

6c50026

SkyRL-Train: v0.1.0

What's Changed

SkyRL-Train + SkyGym code by @SumanthRH in #27
Add ReadTheDocs deployment yaml by @SumanthRH in #28
[fix] Update fsdp_config.max_norm to optimizer_config.max_grad_norm by @CharlieFRuan in #30
Rename skgym to skyrl-gym by @tyler-griggs in #31
Improve installation instructions in README and docs by @SumanthRH in #32
fixing minor issues with examples by @erictang000 in #29
[docs] Fix git clone link by @CharlieFRuan in #34
[Docs] Add more examples to the docs by @erictang000 in #37
[Docs] Add gym tools doc by @tyler-griggs in #36
Use skyrl-gym from PyPI for easier dependency management by @SumanthRH in #35
Doc fixes by @tyler-griggs in #33
Update READMEs by @tyler-griggs in #40
[Docs] Add more detailed example for Text2SQL by @erictang000 in #39
[sglang] Add patch for sglang by @SumanthRH in #42
[Docs] Add ray runtime env hook for uv to docs by @SumanthRH in #41
[Docs] Improve docs sidebar by @SumanthRH in #43
Rename docs from SkyRL-Train to SkyRL by @tyler-griggs in #44
Minor edits by @SumanthRH in #45
add sql repro to skyrl-train readme by @erictang000 in #46
[Docs] Minor fixes to venv creation in the docs by @SumanthRH in #47
[fix] In docs and scripts, test->validation.parquet, and ["x"] -> "['x']" by @CharlieFRuan in #48
[Docs] add system overview doc (pt. 1) by @tyler-griggs in #50
[docs] add a short guide on evaluation, and behavior of having multiple eval datasets by @CharlieFRuan in #49
[trainer] Fix fsdp warmup steps + move warmup to optimizer config by @erictang000 in #52
Make deepspeed optional, so it is not initialized if FSDP backend is used by @pcmoritz in #59
fix: Updated utils.py to fix stop token issue by @AtakanTekparmak in #56
[eval] fix eval batch < dp_size edge case by @erictang000 in #62
[Fix] Minor SQL fixes by @tyler-griggs in #61
[generator] Add max_env_workers argument for generator thread pool executor by @erictang000 in #53
Add Apache 2.0 License by @tyler-griggs in #63
[Cleanup] Many small fixes and improvements by @tyler-griggs in #64
Minor cleanup pt 2. by @tyler-griggs in #67
Revert "Minor cleanup pt 2." by @tyler-griggs in #69
[Cleanup] Remove unused normalize_reward codepath in CriticModel by @SumanthRH in #51
[Installation] Use skyrl-gym as a symlink for easier development; Add a developer guide by @SumanthRH in #71
SearchR1 reproduction fixes by @erictang000 in #65
[Data] Seed the dataloader for reproducibility by @tyler-griggs in #77
[skyrl-gym] search env cleanup by @erictang000 in #75
Minor cleanup by @tyler-griggs in #78
fix: move stop_reason checking logic before list truncation by @hank0316 in #81
Very simple local coding sandbox example by @pcmoritz in #80
[Dependencies] Upgrade to torch 2.7 by @SumanthRH in #73
Update README.md by @tyler-griggs in #85
Add docs badge to README by @CharlieFRuan in #86
[Installation] Update docs to include libnuma installation by @SumanthRH in #89
Support token-level loss, make default by @tyler-griggs in #90
[Installation] Fix Dockerfile after CUDA 12.8 upgrade by @SumanthRH in #91
[Cleanup] Remove unwanted NCCL env vars by @SumanthRH in #92
Add pre-commit hook for gitleaks by @SumanthRH in #93
[FIX] Garbage collect temp buffers after checkpoint by @tyler-griggs in #94
[Bugfix] Disable vllm compilation cache due to compilation issues by @SumanthRH in #95
[Bugfix] Fix env vars after #95 by @SumanthRH in #98
[GPU CI 1/N] Init GPU CI on Anyscale by @SumanthRH in #102
[Docs] Add docs for running on an existing ray cluster by @SumanthRH in #105
[Trainer] Support per-token rewards in trainer by @SumanthRH in #109
Add check for whether p2p access is supported - allows code to run on L4/L40S after #73 upgrade to cuda 12.8 by @erictang000 in #108
[dependencies] Upgrade ray to 2.48.0 by @erictang000 in #106
fix issue with #108 that broke gpu ci by @erictang000 in #112
Add warning for certain uv versions due to uv run --with regression by @SumanthRH in #113
[GPU CI] Only trigger workflow for relevant changes in skyrl-train by @SumanthRH in #114
[bug] Loading saved HF weights errors by @erictang000 in #118
[DAPO] Add support for overlong filtering by @tyler-griggs in #111
[skyrl-gym] GSM8k - LLM Judge example by @lynnliu030 in #74
Fix MLFlow logging by @bthecohen in #121
[Trainer] Support registering custom advantage estimators by @tyler-griggs in #115
[checkpointing] Add HF model config and tokenizer config to checkpoint folder by @erictang000 in #124
Fix discord link by @tyler-griggs in #125
Fix broken link by @tyler-griggs in #128
[Trainer/Algorithm] Support registering custom policy loss functions + refactor adv estimator registry to allow registration outside ray workers by @erictang000 in #126
[trainer] add more robust generation output validation by @erictang000 in #132
[Trainer] GSPO support by @bthecohen in #120
[trainer/algorithm] Implement DAPO and Polaris style dynamic sampling + add DAPO docs + example by @erictang000 in #130
[algorithm] Support Dr. GRPO + refactor where policy/critic loss functions are set by @erictang000 in #133
[fix] move algorithm folder -> algorithms by @erictang000 in #136
Forward mlflow env vars to ray runtime env by @etnbrd in #135
[Fix] Add NCCL_CUMEM_ENABLE=0 for vllm to address weight sync error by @CharlieFRuan in #143
[Generator] Support non-remote (e.g. colocated) SGLang engine by @CharlieFRuan in #68
[GPU CI] Add new workflow for GSM8k e2e convergence test by @erictang000 in #146
[algorithm] add rloo and reinforce++ advantage estimators + improve KL penalty handling by @erictang000 in #137
[FlashRL 1/N] Add support for truncated importance sampling by @SumanthRH in #145
[Fix] Reset registries in BaseFunctionRegistry.sync_with_actor() when needed and fix registry reset by @CharlieFRuan in #144
Optimize tuple unpacking in skyrl_gym_generator.generate() by @davenpi in #138
[Fix] revert "Optimize tuple unpacking in skyrl_gym_generator.generate() " by @SumanthRH in #150
[Gemini] Disable PR summaries for Gemini by @SumanthRH in #151
[fix] Fix the multi-engine launch script by @tyler-griggs in #157
[fix] Formatting fixes to silence linter by @tyler-griggs in #158
Support Flexible Scheduler by @benji-cannot-code in #142
[FlashRL 2/N] Support list of weights during weight sync for colocated training b...

Contributors

pcmoritz, bthecohen, and 12 other contributors

Assets 5

Releases: NovaSky-AI/SkyRL

SkyRL: v0.2.0

Highlights

What's Changed

Contributors

Uh oh!

SkyRL: v0.1.0

Highlights

What's Changed

Contributors

Uh oh!

SkyRL-Train: v0.4.0

Highlights

What's Changed

Contributors

Uh oh!

SkyRL-Train: v0.3.0

Highlights

What's Changed

Contributors

Uh oh!

SkyRL-Train: v0.2.0

Highlights

What's Changed

Contributors

Uh oh!

SkyRL-Train: v0.1.0

What's Changed

Contributors

Uh oh!