Skip to content

[bug][train] Fix max_seq_len calculation#1303

Merged
SumanthRH merged 9 commits intoNovaSky-AI:mainfrom
tamoghnokandar:fix_max_seq_len
Apr 17, 2026
Merged

[bug][train] Fix max_seq_len calculation#1303
SumanthRH merged 9 commits intoNovaSky-AI:mainfrom
tamoghnokandar:fix_max_seq_len

Conversation

@tamoghnokandar
Copy link
Copy Markdown
Contributor

@tamoghnokandar tamoghnokandar commented Mar 10, 2026

Fixes #1154

Summary

This PR removes the implicit max_seq_len heuristic calculation and requires users to set it explicitly when using trainer.algorithm.loss_reduction=seq_mean_token_sum_norm.

Changes

  • Removed the automatic trainer.algorithm.max_seq_len default from SkyRLTrainConfig.__post_init__
  • Added config validation that requires trainer.algorithm.max_seq_len to be explicitly set when loss_reduction == "seq_mean_token_sum_norm"
  • Updated config comments and docs to reflect the new behavior and explain that max_seq_len must be chosen based on the user’s intended sequence-length normalization budget
  • Updated tests to cover:
    • max_seq_len remaining None by default
    • explicit max_seq_len values being preserved
    • validate_cfg() failing when seq_mean_token_sum_norm is used without max_seq_len
    • validate_cfg() continuing to allow token_mean and sequence_mean without max_seq_len
    • validate_cfg() passing when seq_mean_token_sum_norm is used with an explicit max_seq_len

Testing

  • Updated tests/train/test_config.py for the new behavior

Open with Devin

gemini-code-assist[bot]

This comment was marked as resolved.

tamoghnokandar and others added 2 commits March 9, 2026 18:04
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +279 to +285
if cfg.trainer.algorithm.loss_reduction == "seq_mean_token_sum_norm":
if cfg.trainer.algorithm.max_seq_len is None:
raise ValueError(
"`trainer.algorithm.max_seq_len` must be set explicitly when "
"`trainer.algorithm.loss_reduction='seq_mean_token_sum_norm'`. "
"Choose the total sequence-length normalization constant for your setup; "
"this often matches the model context window / vLLM `max_model_len` when appropriate."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Breaking change: Dr. GRPO example script fails because auto-calculated max_seq_len fallback was removed

The PR removes the max_seq_len auto-calculation from SkyRLTrainConfig.__post_init__ (skyrl/train/config/config.py:713-722 on LEFT) and adds a hard assertion requiring it to be set explicitly when loss_reduction='seq_mean_token_sum_norm'. However, the official Dr. GRPO example script at examples/train/algorithms/drgrpo/run_drgrpo_gsm8k.sh:15,23 uses LOSS_REDUCTION="seq_mean_token_sum_norm" but never passes trainer.algorithm.max_seq_len. This script previously worked because __post_init__ auto-computed max_seq_len = max_input_length + max_generate_length. Now it will crash with an AssertionError at validation time.

Same issue in skyrl-agent example

skyrl-agent/examples/run_skyrl/run_skyrl_swe.sh:67 also sets trainer.algorithm.loss_reduction="seq_mean_token_sum_norm" without setting max_seq_len, so it will also fail.

Prompt for agents
Two example scripts need to be updated to explicitly pass trainer.algorithm.max_seq_len now that the auto-calculation fallback has been removed:

1. examples/train/algorithms/drgrpo/run_drgrpo_gsm8k.sh: Add a line like trainer.algorithm.max_seq_len=1536 (512 + 1024, matching max_prompt_length + max_generate_length from the script) to the uv run command.

2. skyrl-agent/examples/run_skyrl/run_skyrl_swe.sh: Add a line like trainer.algorithm.max_seq_len=40768 (8000 + 32768, matching max_prompt_length + max_generate_length from the script) to the uv run command.

Both scripts use loss_reduction=seq_mean_token_sum_norm and will now fail the new assertion at skyrl/train/utils/utils.py:279-285 without this fix.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamoghnokandar this is important - can you grep for all usages of seq_mean_token_sum_norm in our example scripts and ensure that max_seq_len is explicitly passed in now (calculate it based on the generation and input lengths in the script).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOSS_REDUCTION="seq_mean_token_sum_norm"

LOSS_REDUCTION="seq_mean_token_sum_norm"

trainer.algorithm.loss_reduction="seq_mean_token_sum_norm" \

Comment thread skyrl/train/utils/utils.py
Comment thread skyrl/train/config/config.py
@SumanthRH SumanthRH self-assigned this Mar 19, 2026
@SumanthRH SumanthRH self-requested a review March 19, 2026 19:20
Copy link
Copy Markdown
Member

@SumanthRH SumanthRH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamoghnokandar can you merge the latest changes from main? We've had some important updates, especially bf243b8

@tamoghnokandar
Copy link
Copy Markdown
Contributor Author

Done!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add a note in the loss_type docstring that max_seq_len is required for seq_man_token_sum_norm now?

https://github.com/tamoghnokandar/SkyRL/blob/9c44257041101fe52893ad554bff026d49527b58/skyrl/train/config/config.py#L351

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@tamoghnokandar
Copy link
Copy Markdown
Contributor Author

@SumanthRH Ready to be merged

Copy link
Copy Markdown
Member

@SumanthRH SumanthRH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small fix: Can we ensure that we update this script as well now with an explicit max_seq_len argument?

trainer.algorithm.loss_reduction="seq_mean_token_sum_norm" \

Copy link
Copy Markdown
Member

@CharlieFRuan CharlieFRuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you!

generator.inference_engine.http_endpoint_port=8000 \
generator.sampling_params.max_generate_length=4096 \
trainer.algorithm.max_seq_len=$MAX_MODEL_LEN \
generator.inference_engine.engine_init_kwargs.max_model_len=$MAX_MODEL_LEN \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm why constrain inference engine max model len?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a generation entrypoint, so it doesn't really matter. For training this is needed to limit the total context window size for the model, which should match $MAX_MODEL_LEN. For token-in-token-out, that is essentially the maximum sequence length (padded or not)

@SumanthRH SumanthRH enabled auto-merge (squash) April 17, 2026 06:22
@SumanthRH SumanthRH merged commit 25e2d41 into NovaSky-AI:main Apr 17, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug][train] max_seq_len auto-calculation is incorrect for multi-turn in seq_mean_token_sum_norm

3 participants