Skip to content

Releases: NovaSky-AI/SkyRL

SkyRL: v0.2.0

23 Apr 00:11

Choose a tag to compare

Highlights

VLM Support: SkyRL now supports VLM training, both through the Tinker API as well as via the python entrypoint. We've validated stable training for single and multi-turn datasets with both text-only environments and multi-modal environment outputs. Get started here: https://docs.skyrl.ai/docs/tutorials/vision_language_rl

New inference refactor centralizing on HTTP: We've implemented a new HTTP-based refactor (*in inference_servers) for inference with vLLM. This standardizes all inference interactions over HTTP, further integrating vllm-router as a high-performance router for generation requests. We also support prefill-decode disaggregation, allows users to squeeze out more performance in multi-turn async RL use-cases. The new inference codepath is now the default - to use the legacy (inference_engines/), set _SKYRL_USE_NEW_INFERENCE=0. The inference_engines/ codepath will be removed in the next release.

vLLM Native weight syncing API integration: SkyRL's new inference servers implementation uses vLLM's native weight syncing APIs: https://docs.vllm.ai/en/latest/training/weight_transfer/

Step-wise training improvements: We've made a number of fixes to the stepwise training implementation, addressing correctness issues (#1492), implementing support for fully async training (#1536) and adding prefix-aware merging to avoid redundant forward passes (#1532)

R3 support: SkyRL now supports R3 for stabilizing training with MoE models. Currently this is limited to cases where vllm engine is within a node due to limitations from vLLM.

Nemotron 3 and Qwen 3.5 support: SkyRL now supports Nemotron 3 and Qwen 3.5 models. Nemotron 3 is supported in the FSDP and Megatron backends, while Qwen 3.5 is supported in FSDP, Megatron and Jax backends.

What's Changed

  • [trivial] Fix comments numbering and make some code more concise in trainer.py by @CharlieFRuan in #1283
  • [train][1/N] Native Weight Sync API: NCCL by @hao-aaron in #1271
  • [ci] Fix CI from broken test imports from #1271 by @erictang000 in #1290
  • [fix] Fix cuda ipc weight sync after #1271 by @erictang000 in #1292
  • fix paths for instruction comments to match current location by @linde in #1294
  • [CI] Skip FlashRL integration test in CI and fix failing generation test for new inference codepath by @SumanthRH in #1301
  • Skip output_router_logits for granitemoehybrid models by @eltonjohnfanboy in #1295
  • WIP: Restore PR changes lost during skyrl-train deprecation by @tyler-griggs in #1310
  • [chore] Update skyrl and skyrl-gym versions after 0.1.0 release by @SumanthRH in #1312
  • [lint] Add isort to pre-commit by @SumanthRH in #1267
  • [examples][bug] fix silent eval max generate length not overriding by @erictang000 in #1317
  • [docs] Add explicit eval_sampling_params.max_generate_length by @SumanthRH in #1318
  • [vllm] enable mp distributed executor backend (no multi-node engines) by @erictang000 in #1300
  • [train][2/N] Native Weight Syncing APIs: IPC by @hao-aaron in #1291
  • [algorithm][generator] change overlong filtering to use stop reasons over checking eos token by @erictang000 in #1319
  • Add rollout_is policy loss by @SamuelGabriel in #1314
  • [AsyncRL] Use keep mode for pause and resume by @hao-aaron in #1179
  • [skyrl][inference] Fix port collision when ports are allocated. by @nithinvc in #1302
  • R3 PR: Rollout Routing Replay by @erictang000 in #1273
  • [megatron] enable bucketed weight sync for non-colocated nccl weight sync in megatron by @erictang000 in #1324
  • [fix] Fix placement group bundle ordering for inference engines by @SumanthRH in #1308
  • [train][fix] Fix concurrency limitations in the new inference codepath by @SumanthRH in #1320
  • [megatron][lora] Fix megatron lora weight syncing not initializing buckets correctly by @erictang000 in #1330
  • [train] Add worker_process_setup_hook to set mp start method to spawn by @SumanthRH in #1333
  • [CI] Fix test_inference_engines_generation after vllm 0.16.0 upgrade; Use the correct GSM8k path for test_generator_multi_turn_gsm8k_router_replay by @SumanthRH in #1339
  • [train] Make TrainingInputBatch to PAD only to left, hence response tensors be right-aligned by @CharlieFRuan in #1285
  • Revert "[train] Add worker_process_setup_hook to set mp start method to spawn" by @SumanthRH in #1344
  • [Docs] Add docs on agent integration and step-wise training by @CharlieFRuan in #1347
  • [Docs] Small update on docs by @CharlieFRuan in #1348
  • [train] Add validation for step-wise GeneratorOutput by @CharlieFRuan in #1281
  • [megatron] rebuild weight conversion tasks per sync to prevent stale PP-collective caches with bucketing by @erictang000 in #1345
  • [StepWise] Trivial fix to avg_response_length metric by @CharlieFRuan in #1351
  • [CI] Make MultiItemDataset a global variable after switch to spawn by @SumanthRH in #1346
  • [train] Add support for LoRA in the new inference codepath by @SumanthRH in #1329
  • [bug][algorithm] remove incorrect torch.no_grad() for kl in loss (use_kl_loss=True) by @erictang000 in #1353
  • [transformers] set return dict false for transformers v5 compatibility by @erictang000 in #1325
  • [skyrl][tx] Move ModelInput token extraction to backends by @nithinvc in #1352
  • [tx] Fuse the projection matrices for Qwen3 by @pcmoritz in #1341
  • [tx] Fuse the projection matrices for Qwen 3.5 by @pcmoritz in #1362
  • Add CodeScout project to README by @CharlieFRuan in #1364
  • [train] Patch vLLM v0.16.0 sleep mode to properly free model weights by @CharlieFRuan in #1365
  • [tx] Optimize the decode performance by @pcmoritz in #1363
  • [skyrl] Add ImageChunk and ImageAssetPointerChunk types by @nithinvc in #1361
  • [train] Enable support for the mp backend with the new inference codepath by @SumanthRH in #1355
  • Fix loss_fn_outputs right-aligned slicing in Tinker API path by @CharlieFRuan in #1367
  • [bug] Move server creation and server start in the same thread by @hao-aaron in #1375
  • [router replay] downcast expert router indices to uint8/int16 to reduce space by @erictang000 in #1378
  • [train] Fix double-serialization of TensorBatch in pickle by @erictang000 in #1379
  • Bump vLLM to 0.18 by @hao-aaron in #1374
  • [train] Use a shared semaphore for all generate requests with RemoteInferenceClient; Move tokenization to client by @SumanthRH in #1381
  • [async] Add search r1 fully async script by @CharlieFRuan in #1386
  • [train] Add vLLMRouter in new inference codepath by @SumanthRH in #1385
  • [trainer] refactor dispatch_from_staged to individually serialize DP chunks to avoid materializing whole batch on all workers by @erictang000 in #1376
  • [SkyRL] Introduce /render endpoint to the new http inference client by @nithinvc in #1373
  • [async] Add DAPO fully async script by @CharlieFRuan in #1390
  • [examples] update command paths to include train/ in examples by @erictang000 in #1395
  • [bug] fix sleep bugs by @hao-aaron in #1383
  • [train][2/N] Support for Megatron PP + CP for R3 by @devpatelio in #1335
  • [train] Multi-modal inputs support in FSDP2 by @nithinvc in #1331
  • [bug] Fix weight sync with DP > 1 in non-colocated setups by @SumanthRH in #1399
  • [train] Make vLLMRouter...
Read more

SkyRL: v0.1.0

11 Mar 05:04

Choose a tag to compare

Highlights

New unified package : This is the first release of the unified skyrl package combining the skyrl-train and skyrl-tx packages. The unified package brings together the FSDP, Megatron and Jax backend under the Tinker API, while still retaining user-facing "frontend" interfaces (Ex: BasePPOTrainer, SkyRLGymGenerator, etc) from the skyrl-train and skyrl-tx packages. For details about the migration, please refer to #1145.

Improved API documentation: We've revamped the API documentation pages for SkyRL The new API documentation pages can be found here: https://docs.skyrl.ai/api-ref

Pythonic Configs: The skyrl-train backend has now fully migrated to using pythonic dataclasses, replacing the older YAML based interface. The configuration hierarchy has also been updated, and the CLI no longer relies on Hydra. Please refer to the documentation for the new configuration hierarchy: https://docs.skyrl.ai/docs/api-ref/skyrl/config

SGLang is no longer supported: SkyRL no longer supports the SGLang inference engine, unifying on vLLM.

vllm 0.16.0 upgrade: This release updates vLLM to 0.16.0

Qwen 3.5 experimental support: SkyRL now has experimental support for Qwen 3.5 models. This is currently limited to the Jax backend.

What's Changed

Read more

SkyRL-Train: v0.4.0

13 Feb 17:48
332c7cb

Choose a tag to compare

Highlights

Tinker API Integration: SkyRL now fully implements the Tinker API, a simple training and sampling API introduced by Thinking Machines Lab. Any training script written against the Tinker API can run locally on your own GPUs using SkyRL's backends with zero code changes. See the Tinker API docs to get started.

Supported Tinker features include:

  • Supervised fine-tuning (cross_entropy loss) and RL training (importance_sampling loss)
  • LoRA and full-parameter fine-tuning
  • Sampling with logprobs via colocated vLLM inference engines
  • FSDP2 and Megatron training backends
  • Lazy inference engine initialization for SFT-only workloads
  • Ephemeral and persistent weight sync modes

Repo Reorganization: The skyrl-tx and skyrl-train packages are being unified into a single skyrl/ folder. The existing packages remain fully functional and will be migrated to new paths shortly.

Megatron Backend for Tinker: The Megatron strategy is now fully supported for Tinker workloads, including RL training with loss_fn_outputs passthrough.

HTTP Inference Integration: A new HTTP-based inference server integration (feature-flagged) enables decoupled inference engine deployments.

Pythonic Configs: Introduced configuration dataclasses as an alternative to YAML-only configuration, with migration of tests to the new system.

Off-Policy Correction Refactor: Refactored truncated importance sampling (TIS) into a more comprehensive off-policy correction config with support for token-level and sequence-level ratio types.

Harbor Integration: Upstream Harbor integration for evaluation, with Modal support and configurable rate limiting.

Documentation: Migrated documentation to fumadocs, with comprehensive Tinker API docs including quickstart, architecture, cookbook scripts, and configuration pages.

New Model Support (TX):

  • DeepSeekV3 implementation with expert parallelism
  • GLM-4.7 Flash support
  • Qwen3 stacked weights optimization

What's Changed

  • [tx] Add experimental SkyRL-train backend that supports SFT by @pcmoritz in #871
  • Add sampling support for Tinker SkyRL backend by @pcmoritz in #999
  • Add checkpointing support for Tinker SkyRL backend by @pcmoritz in #992
  • Unify Megatron and FSDP training interfaces with forward_backward + optim_step by @pcmoritz in #901
  • Implement forward-only pass and populate metrics by @tyler-griggs in #1046
  • Emit loss_fn_outputs with logprobs for RL losses in forward_backward by @tyler-griggs in #1047
  • [tx] Lazy inference engine initialization by @tyler-griggs in #1069
  • Support colocate_all=False in Tinker backend by @tyler-griggs in #1097
  • [skyrl-train] Return loss_fn_outputs for megatron backend to support tinker RL by @erictang000 in #1102
  • [tx][megatron] making megatron skyrl-train worker usable as TX backend by @erictang000 in #1067
  • [tx][train][merge] make the skyrl folder standalone by @erictang000 in #1084
  • [WIP][skyrl] Create new skyrl folder combining tx + train by @erictang000 in #1068
  • [skyrl-train] Add SFT support via forward_backward(loss_fn="cross_entropy") by @pcmoritz in #961
  • Add set_lr() for dynamic learning rate updates from Tinker by @pcmoritz in #978
  • Fix placement group creation in SkyRL-Train backend by @pcmoritz in #1010
  • [skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N by @CharlieFRuan in #931
  • [skyrl-train][inference] Inference Server Refactor (1/N) by @CharlieFRuan in #899
  • [skyrl-train][refactor] Inference Server Refactor -- RemoteInferenceClient 2/N by @CharlieFRuan in #904
  • [train] Pythonic Configs 1/N - Introduce configuration dataclasses by @CharlieFRuan in #1001
  • [skyrl-train] Refactor TIS to use more comprehensive off policy correction config by @erictang000 in #849
  • [train][Harbor][1/N] Upstream Harbor integration by @CharlieFRuan in #923
  • [Harbor] Add Modal support and bump Harbor version by @CharlieFRuan in #1022
  • [Harbor] Add rate limit for trials/sec and max concurrency by @CharlieFRuan in #1074
  • [tx] DeepseekV3 implementation by @pcmoritz in #889
  • [tx] Add support for GLM-4.7 Flash by @pcmoritz in #1023
  • [tx] Stack weights — Qwen3 by @pcmoritz in #1079
  • [tx] Add EP axis to deepseek by @pcmoritz in #993
  • [tx] chunked logprobs computation for memory efficiency by @pcmoritz in #902
  • [skyrl-train] Add example for 235B LoRA training with Megatron on 4 H100 nodes by @erictang000 in #1000
  • [train] Enable RayPrometheusStatLogger for async vLLM engine by @CharlieFRuan in #900
  • [train][OpenAI] Add generator.served_model_name for /chat/completions by @CharlieFRuan in #970
  • [train] Enable custom chat template for get_response_ids_and_loss_mask_from_messages by @CharlieFRuan in #981
  • [train][vllm] Add enable_log_requests and max_log_len support by @tyler-griggs in #1071
  • [tx] Use WAL mode for sqlite by @pcmoritz in #1054
  • Increase busy timeout for sqlite to avoid database is locked error by @pcmoritz in #1105
  • [tx] Gracefully handle stale save_weights_for_sampler requests on engine restart by @pcmoritz in #1073
  • Migrate documentation to fumadocs by @tyler-griggs in #941
  • Add Tinker integration documentation by @tyler-griggs in #1050
  • [agent] Add YouCom search engine by @caoshiyi in #803

Full Changelog: skyrl_train-v0.3.0...skyrl_train-v0.4.0

SkyRL-Train: v0.3.0

03 Dec 17:07

Choose a tag to compare

Highlights

Asynchronous training: We now support fully asynchronous training in SkyRL, enabling higher throughput for agentic RL: https://skyrl.readthedocs.io/en/latest/tutorials/fully_async.html

Dependency Upgrades:

  • Upgraded vLLM to 0.11.0, Ray to 2.51.1
  • Megatron: Migrated from mbridge to the newer Megatron-Bridge library. The latter is expected to have more active development and support from NVIDIA.

The updated installation instructions can be found here.

Recipes: We've consolidated a list of end-to-end recipes with SkyRL here for reference runs on math, Text2SQL and search tasks.

SkyRL on Managed Platforms: Guides for running SkyRL on managed platforms such as Anyscale, Runpod and SkyPilot can be found here.

Miscellaneous: Support for GPT-OSS, integration with Pytorch's OpenEnv, support for IPv6 clusters, and more!

What's Changed

Read more

SkyRL-Train: v0.2.0

13 Oct 18:12
1ed499c

Choose a tag to compare

Highlights

This release contains 163 commits from 22 contributors, including 11 new contributors!

Megatron Backend: SkyRL now has full support for the Megatron training backend with 5D parallelism and strong support for large-scale MoE training. Learn more in our Megatron guide and examples.

LoRA Support: SkyRL now supports LoRA training with the FSDP backend and vLLM inference engine. Learn more in our LoRA guide and examples. We will continue aggressively improving LoRA support and performance, tracked in #449.

OpenAI API Compatibility: SkyRL has standardized around the OpenAI API for inference. This means that agents and agent scaffolds can call into the inference engine over the OpenAI API. SkyRL manages the inference engines and will provide a base_url to hit an OpenAI API compatible endpoint.

Integrations: Building on top of our standardization on OpenAI APIs, we integrated several popular environment and agentic projects. A couple highlights include:

What's Changed

Read more

SkyRL-Train: v0.1.0

20 Aug 01:20
6c50026

Choose a tag to compare

What's Changed

Read more