Skip to content

[recipe] OTTL wire-up for tracecore.alert.training_step_stalled (pattern #7) #365

Description

@trilamsr

Pattern-7 detector consumes tracecore.alert.training_step_stalled.no_progress_seconds bridge attribute. OTTL metrics→logs recipe (RFC-0014 convention, mirroring pattern-5 pcie_rate_collapse shape) is the integration gap.

Scope: OTTL stanza that converts step-duration metric stall into the training-step-stalled log record. Cross-link from docs/patterns/07-dataloader-hang.md.

Tracked from: PR #346 Open Q#1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationrc1-prepv1.0-rc1 preparation tasks per docs/v1-rc1-operational-gaps.md

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions