Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion components/receivers/pyspy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ When something goes wrong, the receiver bumps one of these `IncError` kinds and
| `faulthandler_missing` | Helper hello said `unsupported: true` | Workload runtime must be CPython 3.3+; PyPy/MicroPython unsupported |
| `uds_dir_permission_denied` | Receiver can't `stat`/read `uds_dir` at Start | Check container UID + mount permissions |
| `helper_oom_mid_dump` | Helper replied with `reason="MemoryError"` | Workload near OOM; lower `max_threads_per_dump` or raise pod limit |
| `sidecar_uid_drift` | Workload UID ≠ sidecar UID (Phase 4 chart guards this) | Align `runAsUser` between containers |

## CI gates

Expand Down
10 changes: 1 addition & 9 deletions components/receivers/pyspy/RUNBOOK.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Degraded mode never restarts the receiver; absent data is the operator-visible s

**Likely causes.**
- Helper crashed and left a stale socket inode.
- Sidecar/workload UID drift (see `sidecar_uid_drift` for the typed variant).
- Sidecar/workload UID drift (collector and helper containers running as different UIDs leaves the helper-bound `0700` UDS unreachable for the collector).

**Investigation.**
1. `stat <uds_dir>/pyspy.<pid>.sock` - confirm file exists.
Expand Down Expand Up @@ -123,14 +123,6 @@ Degraded mode never restarts the receiver; absent data is the operator-visible s

**Remediation.** Align directory permissions or `securityContext.runAsUser` between helper and receiver. Distinct from `target_not_attached` (directory readable but empty).

## kind=sidecar_uid_drift

**Trigger.** Helper bound the UDS with the workload's UID and mode `0700`; sidecar collector has a different UID and gets `EACCES` on `connect()`. Receiver enters `target_not_listening` posture for that PID; this kind distinguishes it from a generic stale socket.

**Investigation.** Compare `runAsUser` in the helper container's `securityContext` vs the collector's. The Helm chart (Phase 4) defaults both from one variable; manual installs need explicit alignment.

**Remediation.** Set both containers to the same `runAsUser`, or set the workload's helper to mode `0770` and group both UIDs into the same `runAsGroup`.

## kind=panic

**Trigger.** A panic was recovered inside `internal/safe.Call`.
Expand Down
7 changes: 3 additions & 4 deletions components/receivers/pyspy/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ import (

"github.com/tracecoreai/tracecore/internal/consumer"
"github.com/tracecoreai/tracecore/internal/pipeline"
"github.com/tracecoreai/tracecore/internal/selftelemetry"
)

// ComponentType is the canonical receiver-factory ID. Centralized
Expand Down Expand Up @@ -56,11 +55,11 @@ func (*factory) CreateLogs(ctx context.Context, set pipeline.CreateSettings, cfg
}
r := newReceiver(set, c, next)
if set.Telemetry.MeterProvider != nil {
if rt, err := selftelemetry.NewReceiver(set.ID, set.Telemetry.MeterProvider); err == nil {
if rt, err := newSelfTelemetry(set.ID, set.Telemetry.MeterProvider); err == nil {
r.telemetry = rt
} else {
selftelemetry.RecordInitError(ctx, set.Telemetry.MeterProvider,
"receiver", set.ID.String(), selftelemetry.ReasonInstrumentRegister)
recordInitError(ctx, set.Telemetry.MeterProvider,
"receiver", set.ID.String(), reasonInstrumentRegister)
if set.Telemetry.Logger != nil {
set.Telemetry.Logger.Warn("pyspy self-telemetry init failed; using noop", "err", err)
}
Expand Down
22 changes: 11 additions & 11 deletions components/receivers/pyspy/factory_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,38 +11,38 @@ import (
"github.com/tracecoreai/tracecore/internal/pipeline"
)

func TestFactory_Type(t *testing.T) {
func TestPyspy_Type(t *testing.T) {
require.Equal(t, ComponentType, Factory.Type().String())
}

func TestFactory_DefaultConfigValidates(t *testing.T) {
func TestPyspy_DefaultConfigValidates(t *testing.T) {
cfg := Factory.CreateDefaultConfig()
require.NoError(t, cfg.Validate())
_, ok := cfg.(*Config)
require.True(t, ok, "default config must be *Config")
}

// TestFactory_CreateMetricsReturnsErrSignalNotSupported pins that
// TestPyspy_CreateMetricsReturnsErrSignalNotSupported pins that
// the metrics signal returns the canonical sentinel; the pipeline
// runtime uses errors.Is to surface a clear operator message.
func TestFactory_CreateMetricsReturnsErrSignalNotSupported(t *testing.T) {
func TestPyspy_CreateMetricsReturnsErrSignalNotSupported(t *testing.T) {
_, err := Factory.CreateMetrics(context.Background(), pipeline.CreateSettings{}, defaultConfig(), nil)
require.ErrorIs(t, err, pipeline.ErrSignalNotSupported,
"CreateMetrics must return pipeline.ErrSignalNotSupported")
}

func TestFactory_CreateTracesReturnsErrSignalNotSupported(t *testing.T) {
func TestPyspy_CreateTracesReturnsErrSignalNotSupported(t *testing.T) {
_, err := Factory.CreateTraces(context.Background(), pipeline.CreateSettings{}, defaultConfig(), nil)
require.ErrorIs(t, err, pipeline.ErrSignalNotSupported,
"CreateTraces must return pipeline.ErrSignalNotSupported")
}

// TestFactory_CreateLogsReturnsRealReceiver pins that the logs
// TestPyspy_CreateLogsReturnsRealReceiver pins that the logs
// signal actually constructs a working Receiver. The Phase 3
// pprof-dictionary emission path may move this registration to
// CreateProfiles per RFC-0009 §6 footnote; until then logs is the
// registered signal.
func TestFactory_CreateLogsReturnsRealReceiver(t *testing.T) {
func TestPyspy_CreateLogsReturnsRealReceiver(t *testing.T) {
r, err := Factory.CreateLogs(context.Background(),
pipeline.CreateSettings{ID: pipeline.MustNewID(pipeline.MustNewType(ComponentType), "")},
defaultConfig(),
Expand All @@ -51,23 +51,23 @@ func TestFactory_CreateLogsReturnsRealReceiver(t *testing.T) {
require.NotNil(t, r)
}

// TestFactory_CreateLogsRejectsWrongConfigType pins the runtime's
// TestPyspy_CreateLogsRejectsWrongConfigType pins the runtime's
// type-safety expectation: a factory handed a config of the wrong
// type returns an error naming the actual type so the operator can
// chase the misconfigured pipeline.
func TestFactory_CreateLogsRejectsWrongConfigType(t *testing.T) {
func TestPyspy_CreateLogsRejectsWrongConfigType(t *testing.T) {
type wrongConfig struct{ pipeline.Config }
_, err := Factory.CreateLogs(context.Background(), pipeline.CreateSettings{}, &wrongConfig{}, nil)
require.Error(t, err)
require.ErrorContains(t, err, "pyspy")
require.ErrorContains(t, err, "config type")
}

// TestNewFactory_ReturnsTheSamePackageVar pins that NewFactory()
// TestPyspy_NewFactory_ReturnsTheSamePackageVar pins that NewFactory()
// is the codegen seam: tools/components-gen emits calls to it, and
// it must return the package-private Factory var. Otherwise an
// external consumer (rare) could end up with a stale factory
// instance that didn't pick up package-level wiring.
func TestNewFactory_ReturnsTheSamePackageVar(t *testing.T) {
func TestPyspy_NewFactory_ReturnsTheSamePackageVar(t *testing.T) {
require.Same(t, Factory, NewFactory())
}
75 changes: 75 additions & 0 deletions components/receivers/pyspy/fake_telemetry_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
// SPDX-License-Identifier: Apache-2.0

package pyspy

import (
"sync"
"sync/atomic"
"time"
)

// fakeTelemetry is a test-only captor for the receiver-scoped
// selfTelemetry surface. Replaces the v0.1.x dependency on
// `internal/selftelemetry.CapturingReceiver` so the receiver package
// stays decoupled from internal/* (PR-F unblock). Methods match the
// selfTelemetry interface 1:1; the captured-state accessors are the
// surface pyspy_test.go asserts against.
type fakeTelemetry struct {
emissions atomic.Int64
latency atomic.Int32
activity atomic.Int32

mu sync.Mutex
errorKinds []kind
degradedSet []bool
}

func newFakeTelemetry() *fakeTelemetry { return &fakeTelemetry{} }

func (f *fakeTelemetry) IncError(k kind) {
f.mu.Lock()
f.errorKinds = append(f.errorKinds, k)
f.mu.Unlock()
}

func (f *fakeTelemetry) IncEmissions(n int64) {
if n < 0 {
return
}
f.emissions.Add(n)
}

func (f *fakeTelemetry) ObserveLatency(time.Duration) { f.latency.Add(1) }

func (f *fakeTelemetry) SetDegraded(b bool) {
f.mu.Lock()
f.degradedSet = append(f.degradedSet, b)
f.mu.Unlock()
}

func (f *fakeTelemetry) MarkActivity() { f.activity.Add(1) }

// Errors returns a snapshot of every IncError kind in call order so
// tests can assert "kindX was recorded" or "kindX recorded exactly N
// times". Returned slice is safe to mutate.
func (f *fakeTelemetry) Errors() []kind {
f.mu.Lock()
defer f.mu.Unlock()
out := make([]kind, len(f.errorKinds))
copy(out, f.errorKinds)
return out
}

// DegradedTransitions returns a snapshot of every SetDegraded value in
// call order — tests assert "first transition was true" + "degraded
// did not flip back to false during Phase 1".
func (f *fakeTelemetry) DegradedTransitions() []bool {
f.mu.Lock()
defer f.mu.Unlock()
out := make([]bool, len(f.degradedSet))
copy(out, f.degradedSet)
return out
}

// Verify fakeTelemetry satisfies the interface at compile time.
var _ selfTelemetry = (*fakeTelemetry)(nil)
37 changes: 22 additions & 15 deletions components/receivers/pyspy/kinds.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,37 @@

package pyspy

import "github.com/tracecoreai/tracecore/internal/selftelemetry"
// kind is a low-cardinality error-class identifier. Mirrors the
// internal/selftelemetry.Kind type so the migration is mechanical;
// receiver-local because the canonical-Kind enforcement that the
// internal package owned moves into RFC-0013 PR-I's submodule.
type kind string

// Degraded-mode Kind values for RFC-0009 §Degraded modes. Operator
// semantics live in RUNBOOK.md; only the wire strings are stable here.
// Adding a kind: declare it, document in RUNBOOK.md, and add a
// prometheus-alerts.example.yaml entry if it has an actionable threshold.
const (
kindTargetNotAttached selftelemetry.Kind = "target_not_attached"
kindTargetNotListening selftelemetry.Kind = "target_not_listening"
kindTargetGone selftelemetry.Kind = "target_gone"
kindDumpOverlap selftelemetry.Kind = "dump_overlap"
kindProtocolVersion selftelemetry.Kind = "protocol_version"
kindDumpFailed selftelemetry.Kind = "dump_failed"
kindFrameTooLarge selftelemetry.Kind = "frame_too_large"
kindParseError selftelemetry.Kind = "parse_error"
kindFaulthandlerMissing selftelemetry.Kind = "faulthandler_missing"
kindUDSDirPermissionDenied selftelemetry.Kind = "uds_dir_permission_denied"
kindHelperOOMMidDump selftelemetry.Kind = "helper_oom_mid_dump"
kindSidecarUIDDrift selftelemetry.Kind = "sidecar_uid_drift"
kindTargetNotAttached kind = "target_not_attached"
kindTargetNotListening kind = "target_not_listening"
kindTargetGone kind = "target_gone"
kindDumpOverlap kind = "dump_overlap"
kindProtocolVersion kind = "protocol_version"
kindDumpFailed kind = "dump_failed"
kindFrameTooLarge kind = "frame_too_large"
kindParseError kind = "parse_error"
kindFaulthandlerMissing kind = "faulthandler_missing"
kindUDSDirPermissionDenied kind = "uds_dir_permission_denied"
kindHelperOOMMidDump kind = "helper_oom_mid_dump"
// kindPanic mirrors the canonical selftelemetry.KindPanic — the
// receiver-scoped sibling has no separate canonical-Kind enforcement
// (that lives in the deleted internal package); the panic kind is
// declared locally so the lifecycle's onPanic callback can tick it.
kindPanic kind = "panic"
)

// allKinds enforces parity with RFC-0009 §Degraded modes via kinds_test.go.
var allKinds = []selftelemetry.Kind{
var allKinds = []kind{
kindTargetNotAttached,
kindTargetNotListening,
kindTargetGone,
Expand All @@ -36,5 +44,4 @@ var allKinds = []selftelemetry.Kind{
kindFaulthandlerMissing,
kindUDSDirPermissionDenied,
kindHelperOOMMidDump,
kindSidecarUIDDrift,
}
1 change: 0 additions & 1 deletion components/receivers/pyspy/kinds_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ func TestKinds_AllRFC0009DegradedModesCovered(t *testing.T) {
"faulthandler_missing": {},
"uds_dir_permission_denied": {},
"helper_oom_mid_dump": {},
"sidecar_uid_drift": {},
}
got := map[string]struct{}{}
for _, k := range allKinds {
Expand Down
Loading
Loading