Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion module/receiver/ncclfrreceiver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,14 @@ you can exercise the full receiver pipeline on a laptop:
```bash
mkdir -p /tmp/nccl-fr
tracecore failure-inject nccl-hang --out /tmp/nccl-fr/rank-0.pkl
tracecore collect --config module/receiver/ncclfrreceiver/example_config.yaml
./_build/tracecore --config=module/receiver/ncclfrreceiver/example_config.yaml
```

> The legacy `tracecore collect` subcommand was removed in
> [RFC-0013 PR-A2](../../../docs/rfcs/0013-distro-first-pivot.md);
> the OCB-assembled binary takes `--config=` directly. Build via
> `make build` from the repo root.

The receiver picks up the synthesized hang dump within `poll_interval`
and emits two records (a completed all-reduce + a started one that
never completes) via the configured exporter.
Expand Down
8 changes: 6 additions & 2 deletions module/receiver/ncclfrreceiver/example_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,14 @@ receivers:
hw_id: gpu-0

exporters:
stdoutexporter:
# RFC-0013 PR-A2 retired the in-tree `stdoutexporter`; the upstream
# `debug` exporter is the post-pivot successor. `verbosity: detailed`
# prints every attribute, matching the old stdoutexporter output.
debug:
verbosity: detailed

service:
pipelines:
logs:
receivers: [nccl_fr]
exporters: [stdoutexporter]
exporters: [debug]