Skip to content

[CI Failure Doctor] CI Failure Investigation - Run #36032 #16125

@github-actions

Description

@github-actions

🏥 CI Failure Investigation - Run #36032

Summary

Integration: CLI Completion & Other fails because TestMCPRegistryClient_LiveGetServer now hits the live MCP registry and the service is returning 503 upstream connect error or disconnect/reset before headers with a delayed connect failure, so the test cannot reach io.github.netdata/mcp-server.

Failure Details

  • Run: 22068117409
  • Commit: 5e5b9d282752b1430867cdc76a09603348c08d4c
  • Trigger: push

Root Cause Analysis

  1. TestMCPRegistryClient_LiveGetServer connects to the live MCP registry while exercising GetServer; the registry returned 503 upstream connect error or disconnect/reset before headers with the latest retry reporting delayed connect error: Connection refused, so the subtest cannot complete.
  2. Every subtest (get_github_server and get_nonexistent_server) tries to assert specific output but receives the same 503, which is treated as a failure instead of being skipped or mocked.

Failed Jobs and Errors

  • Integration: CLI Completion & Other: TestMCPRegistryClient_LiveGetServer/get_github_server
    • mcp_registry_live_test.go:141: GetServer failed for 'io.github.netdata/mcp-server': MCP registry returned status 503: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: remote connection failure, transport failure reason: delayed connect error: Connection refused
  • Integration: CLI Completion & Other: TestMCPRegistryClient_LiveGetServer/get_nonexistent_server
    • mcp_registry_live_test.go:175: Expected error to contain 'not found in registry', got: MCP registry returned status 503: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: remote connection failure, transport failure reason: delayed connect error: Connection refused

Investigation Findings

  • Running go test -v -tags integration ./pkg/cli -run TestMCPRegistryClient_LiveGetServer against the live registry reproduces the 503/delayed connect error because the test talks to io.github.netdata/mcp-server and the registry is currently refusing connections.
  • The integration suite therefore fails before reporting a specific test since the package-level run detects the panic/failure and aborts, logging that no individual test passed cleanly.

Recommended Actions

  • Guard TestMCPRegistryClient_LiveGetServer (and similar MCP live tests) so that 5xx/delayed-connect responses are skipped or stubbed instead of failing the suite, e.g., detect the 503 and mark the test as skipped when the registry is unreachable.
  • Replace the live MCP dependency in CI with a stub or canned response when possible so transient outages do not break the workflow.
  • Rerun the integration job after MCP connectivity is restored to confirm there are no additional regressions.

Prevention Strategies

  • Avoid calling production MCP services directly from CI without handling known failure modes (503s, connection refused, etc.) and mark the tests as flaky or skipped when the service is down.
  • Use local stubs or recorded fixtures for MCP responses in GitHub Actions so network availability does not gate the whole suite.

AI Team Self-Improvement

  • When generating tests that talk to MCP or other external services, guard them with explicit skip/retry logic and explain that 5xx/delayed connect errors should not be treated as regressions.
  • Prefer mocking remote MCP responses in CI workflows so the tests stay deterministic even if the upstream service is temporarily unreachable.

Historical Context

🩺 Diagnosis provided by CI Failure Doctor

To install this workflow, run gh aw add githubnext/agentics/workflows/ci-doctor.md@ea350161ad5dcc9624cf510f134c6a9e39a6f94d. View source at https://github.com/githubnext/agentics/tree/ea350161ad5dcc9624cf510f134c6a9e39a6f94d/workflows/ci-doctor.md.

  • expires on Feb 17, 2026, 3:26 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Labels

    cookieIssue Monster Loves Cookies!

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions