[Torch 2.10 Megatron GRPO] The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient.

### Checklist / 检查清单

- [x] I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues，确认这是一个新的 bug report。

### Bug Description / Bug 描述

Hi, 用最新的镜像训练GRPO （Qwen3.5 35b-a3b）得到了这个warning：

镜像：modelscope-registry.cn-hangzhou.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.8.1-py311-torch2.10.0-vllm0.17.0-modelscope1.34.0-swift4.0.1

The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient.

请问会不会影响训练精度呢 （目前看好像是torch 2.10.0 + cuda 12.8会有的问题）[参考报错](https://github.com/Lightning-AI/pytorch-lightning/issues/21567)

### How to Reproduce / 如何复现

    training_options=" \
        --model_type qwen3_5_moe \
        --freeze_llm false \
        --freeze_vit true \
        --freeze_aligner true \
        --add_non_thinking_prefix true \
        --loss_scale ignore_empty_think \
        --decoder_first_pipeline_num_layers 24 \
        --steps_per_generation 8 \
        --micro_batch_size 1 \
        --global_batch_size 64 \
        --num_generations 8"
        # --mtp_num_layers ${NUM_MTP_LAYER} \
        # --mtp_loss_scaling_factor 0.1 \
    # VLLM options
    vllm_options=" \
        --use_vllm true \
        --vllm_mode colocate \
        --vllm_tensor_parallel_size 16 \
        --vllm_gpu_memory_utilization 0.5 \
        --vllm_max_model_len 20480"
    # Common training options
    common_training_options=" \
        --rlhf_type grpo \
        --loss_type sapo \
        --max_length 16384 \
        --max_completion_length 4086 \
        --lr 1e-5 \
        --lr_warmup_fraction 0.05 \
        --min_lr 1e-6 \
        --train_type full \
        --reward_funcs format \
        --tau_pos 1 \
        --tau_neg 1.05 \
        --epsilon 0.2 \
        --epsilon_high 0.2 \
        --beta 0.001 \
        --finetune true \
        --packing false \
        --padding_free true \
        --dynamic_sample false \
        --num_train_epochs 2 \
        --overlong_filter false \
        --importance_sampling_level token"
        # > 注意：如果开启`overlong_filter`, kl 和 clip_ratio 指标会过滤超长的样本
        # --external_plugins ${CUSTOM_WORK_DIR}/scripts/base/grpo/latest_plugin.py \

    # Checkpoint options
    checkpoint_options=" \
        --recompute_granularity full \
        --recompute_method uniform \
        --recompute_num_layers 1 \
        --attention_backend flash \
        --save_strategy epoch \
        --save_steps 500 \
        --logging_steps 1 \
        --log_completions true \
        --dataloader_num_workers 32 \
        --output_dir ${OUTPUT_DIR}/${OUT_NAME} \
        --no_save_optim \
        --no_save_rng"
    # Offload options
    offload_options=" \
        --offload_bridge true \
        --sleep_level 2 \
        --offload_model true \
        --offload_optimizer true \
        --optimizer_cpu_offload true"

### Additional Information / 补充信息

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Torch 2.10 Megatron GRPO] The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. #8317

Checklist / 检查清单

Bug Description / Bug 描述

How to Reproduce / 如何复现

Additional Information / 补充信息

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Torch 2.10 Megatron GRPO] The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. #8317

Description

Checklist / 检查清单

Bug Description / Bug 描述

How to Reproduce / 如何复现

Additional Information / 补充信息

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions