Skip to content

Training Framework Configuration Support for LLaDA-MoE #9

Description

@maomaocun

an we configure custom training for LLaDA-MoE variants, like adding MoE-specific YAML params (e.g., expert routing) and VeOmni integration?
On a related note, during my experiments with LLaDA-MoE following the paper's exact settings , I'm seeing the z-loss (noise prediction component) steadily rising after interation.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions