OTel Hook: Cross-SDK inconsistencies and proposal for a shared test harness (like flagd-testbed)

## Background

[Appendix D](https://openfeature.dev/specification/appendix-d-observability) defines the observability conventions for OpenFeature telemetry hooks — attribute names, event naming, hook lifecycle placement, and metadata mappings. Multiple SDK-contrib repositories already ship OTel hook implementations:

| SDK | Implementation |
|-----|----------------|
| Java | [java-sdk-contrib – hooks/open-telemetry](https://github.com/open-feature/java-sdk-contrib/tree/main/hooks/open-telemetry) |
| Go | [go-sdk-contrib – hooks/open-telemetry](https://github.com/open-feature/go-sdk-contrib/tree/main/hooks/open-telemetry) |
| Python | [python-sdk-contrib – hooks/openfeature-hooks-opentelemetry](https://github.com/open-feature/python-sdk-contrib/tree/main/hooks/openfeature-hooks-opentelemetry) |
| .NET | [dotnet-sdk-contrib – src/OpenFeature.Contrib.Hooks.Otel](https://github.com/open-feature/dotnet-sdk-contrib/tree/main/src/OpenFeature.Contrib.Hooks.Otel) |

Each has its own isolated unit tests but there is **no shared, cross-SDK test harness** that verifies compliance with Appendix D — analogous to how [flagd-testbed](https://github.com/open-feature/flagd-testbed) provides Gherkin suites + a Docker container for validating flagd provider behavior across all SDKs.

An audit of current implementations reveals **multiple spec-compliance gaps** that a shared harness would catch automatically. Notably, some core SDKs have also defined their own telemetry helpers (Go SDK, Python SDK, .NET SDK) with varying degrees of alignment to the spec.

---

## Inconsistencies Found

### 1. Attribute Names — Out of Sync with Current OTel Semconv

Appendix D defines the **current** attribute names following OTel semconv renames in v1.32–v1.34. The table below shows the full picture across hook implementations *and* core SDK telemetry helpers:

| Attribute (Appendix D) | Java hook | Go hook | Python hook | .NET hook | Python SDK core | .NET SDK core | Go SDK core |
|------------------------|-----------|---------|-------------|-----------|-----------------|---------------|-------------|
| `feature_flag.key` | ✅ | ✅ | ✅ | ✅ (as `"key"` ⚠️) | ✅ | ✅ | ✅ |
| `feature_flag.result.variant` | ❌ old `feature_flag.variant` | ✅ | ❌ old `feature_flag.variant` | ❌ old `feature_flag.variant` | ❌ old `feature_flag.variant` | ✅ | ✅ |
| `feature_flag.result.reason` | ❌ not emitted in traces | ✅ | ❌ not emitted | ❌ unnamespaced `reason` ⚠️ | ❌ old `feature_flag.evaluation.reason` | ✅ | ✅ |
| `feature_flag.result.value` | ❌ not emitted | ✅ | ❌ not emitted | ❌ not emitted | ❌ not emitted | ✅ | ✅ |
| `feature_flag.provider.name` | ❌ old `feature_flag.provider_name` | ✅ | ❌ old `feature_flag.provider_name` | ❌ old `provider_name` ⚠️ | ❌ old `feature_flag.provider_name` | ✅ | ✅ |
| `feature_flag.context.id` | ❌ not emitted | ✅ | ❌ not emitted | ❌ not emitted | ✅ | ✅ | ✅ |
| `feature_flag.set.id` | ❌ not emitted | ❌ not emitted | ❌ not emitted | ❌ not emitted | ✅ | ✅ | ✅ |
| `feature_flag.version` | ❌ not emitted | ❌ not emitted | ❌ not emitted | ❌ not emitted | ✅ | ✅ | ✅ |
| `error.type` | ❌ not emitted | ✅ | ❌ not emitted | ❌ not emitted | ✅ | ✅ | ✅ |
| `error.message` | ❌ not emitted | ✅ | ❌ not emitted | ❌ not emitted | ❌ old `feature_flag.evaluation.error.message` | ✅ | ✅ |

⚠️ = .NET contrib `MetricsHook` uses entirely unqualified keys (`"key"`, `"provider_name"`, `"variant"`, `"reason"`) rather than the `feature_flag.*` namespace.

A particularly striking disconnect: the **.NET SDK core** (`TelemetryConstants.cs`) defines **correct** up-to-date attribute names, while the **.NET contrib hook** (`TracingHook.cs` / `MetricsHook.cs`) in the same ecosystem still uses the old ones. This shows how quickly implementations drift from the spec without automated validation.

### 2. Span / Log Event Name

Appendix D and the spec's Hook Lifecycle guidance recommend using `"feature_flag.evaluation"` as the event name. Implementations diverge:

| SDK | Event name |
|-----|-----------|
| Java hook | `"feature_flag"` ❌ |
| Go hook | `"feature_flag.evaluation"` ✅ |
| Python hook | `"feature_flag"` ❌ |
| .NET hook | `"feature_flag"` (ActivityEvent) ❌ |
| Go SDK core | `"feature_flag.evaluation"` ✅ |

### 3. Hook Lifecycle Stage for Emitting Telemetry

Appendix D states:
> *"The **finally** hook stage is where telemetry signals are emitted with complete evaluation details."*

| SDK | Stage used |
|-----|-----------|
| Java `TracesHook` | `after` ❌ |
| Go `traceHook` | `finally` ✅ |
| Python `TracingHook` | `after` ❌ |
| .NET `TracingHook` | `AfterAsync` ❌ |

Using `after` skips the error path — an evaluation that throws will never emit a trace event in Java, Python, or .NET.

### 4. OTel Semconv Version Drift — Even Within a Single Repo

In `go-sdk-contrib`, `metrics.go` imports `semconv/v1.34.0` while `traces.go` imports `semconv/v1.37.0`. This kind of intra-repo drift is invisible without a contract-based harness.

### 5. Metrics — No Formal Spec Coverage

Java, Go, and .NET all ship a `MetricsHook` emitting:
- `feature_flag.evaluation_active_count`
- `feature_flag.evaluation_requests_total`
- `feature_flag.evaluation_success_total`
- `feature_flag.evaluation_error_total`

These names are consistent across SDKs, but **Appendix D defines no metrics conventions**. Python has no metrics hook at all. Without a normative spec for metric names, dimensions, and instrument types (counter vs. updown-counter), future implementations will diverge.

---

## Proposal: Shared OTel Hook Test Harness

The [flagd-testbed](https://github.com/open-feature/flagd-testbed) model is a proven pattern:
- A central repo defines language-agnostic Gherkin feature files
- Each SDK imports it as a git submodule and implements step definitions using its language's test tooling
- Automated CI in each SDK repo gates PRs on compliance

We propose the same approach for OTel hooks, extending the **spec** repo itself (since Appendix B already hosts `evaluation.feature`, `hooks.feature`, etc.) with OTel-specific Gherkin suites:

```
specification/assets/gherkin/otel-traces-hook.feature
specification/assets/gherkin/otel-metrics-hook.feature
```

Each scenario would:
- Set up a minimal in-memory OTel exporter (spans or metrics)
- Invoke a flag evaluation through the hook under test
- Assert the **correct current attribute keys and values** are emitted
- Assert **deprecated/renamed attribute names are NOT present**
- Cover all hook lifecycle stages (`before`, `after`, `error`, `finally`)
- Cover error scenarios (`error.type`, `error.message`)
- Cover metadata → attribute mapping (`feature_flag.context.id`, `feature_flag.set.id`, `feature_flag.version`)

Each SDK's OTel hook would consume these feature files and implement step definitions using its language's OTel test SDK:
- Java: `OpenTelemetryExtension` (JUnit 5)
- Go: `tracetest.NewInMemoryExporter` / `metric.NewManualReader`
- Python: `opentelemetry-sdk` in-memory exporters
- .NET: `AddInMemoryExporter`

---

## Why This Matters

OTel semconv for feature flags has seen **5 breaking attribute renames** between v1.32 and v1.34. Without a shared test harness, SDKs silently fall behind — and users instrumenting multiple services in different languages get **inconsistent telemetry**. Dashboards, alerts, and correlation queries built on one SDK's attribute names silently break when querying data from another SDK.

Appendix D already contains the normative rules. Gherkin suites would make those rules **machine-verifiable** across every SDK implementation, closing the loop between spec and implementation — exactly as flagd-testbed does for provider conformance.

---

## Related

- [Appendix D: Observability](https://openfeature.dev/specification/appendix-d-observability)
- [Appendix B: Gherkin Suites](https://openfeature.dev/specification/appendix-b-gherkin-suites)
- [flagd-testbed](https://github.com/open-feature/flagd-testbed)
- [OTel semconv feature-flags changelog v1.32](https://github.com/open-telemetry/semantic-conventions/releases/tag/v1.32.0)
- [OTel semconv feature-flags changelog v1.34](https://github.com/open-telemetry/semantic-conventions/releases/tag/v1.34.0)
- [OTel Span Event Deprecation Plan](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/4430-span-event-api-deprecation-plan.md)


SDK	Stage used
Java `TracesHook`	`after` ❌
Go `traceHook`	`finally` ✅
Python `TracingHook`	`after` ❌
.NET `TracingHook`	`AfterAsync` ❌

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTel Hook: Cross-SDK inconsistencies and proposal for a shared test harness (like flagd-testbed) #373

Background

Inconsistencies Found

1. Attribute Names — Out of Sync with Current OTel Semconv

2. Span / Log Event Name

3. Hook Lifecycle Stage for Emitting Telemetry

4. OTel Semconv Version Drift — Even Within a Single Repo

5. Metrics — No Formal Spec Coverage

Proposal: Shared OTel Hook Test Harness

Why This Matters

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SDK	Implementation
Java	java-sdk-contrib – hooks/open-telemetry
Go	go-sdk-contrib – hooks/open-telemetry
Python	python-sdk-contrib – hooks/openfeature-hooks-opentelemetry
.NET	dotnet-sdk-contrib – src/OpenFeature.Contrib.Hooks.Otel

Attribute (Appendix D)	Java hook	Go hook	Python hook	.NET hook	Python SDK core	.NET SDK core	Go SDK core
`feature_flag.key`	✅	✅	✅	✅ (as `"key"` ⚠️)	✅	✅	✅
`feature_flag.result.variant`	❌ old `feature_flag.variant`	✅	❌ old `feature_flag.variant`	❌ old `feature_flag.variant`	❌ old `feature_flag.variant`	✅	✅
`feature_flag.result.reason`	❌ not emitted in traces	✅	❌ not emitted	❌ unnamespaced `reason` ⚠️	❌ old `feature_flag.evaluation.reason`	✅	✅
`feature_flag.result.value`	❌ not emitted	✅	❌ not emitted	❌ not emitted	❌ not emitted	✅	✅
`feature_flag.provider.name`	❌ old `feature_flag.provider_name`	✅	❌ old `feature_flag.provider_name`	❌ old `provider_name` ⚠️	❌ old `feature_flag.provider_name`	✅	✅
`feature_flag.context.id`	❌ not emitted	✅	❌ not emitted	❌ not emitted	✅	✅	✅
`feature_flag.set.id`	❌ not emitted	❌ not emitted	❌ not emitted	❌ not emitted	✅	✅	✅
`feature_flag.version`	❌ not emitted	❌ not emitted	❌ not emitted	❌ not emitted	✅	✅	✅
`error.type`	❌ not emitted	✅	❌ not emitted	❌ not emitted	✅	✅	✅
`error.message`	❌ not emitted	✅	❌ not emitted	❌ not emitted	❌ old `feature_flag.evaluation.error.message`	✅	✅

SDK	Event name
Java hook	`"feature_flag"` ❌
Go hook	`"feature_flag.evaluation"` ✅
Python hook	`"feature_flag"` ❌
.NET hook	`"feature_flag"` (ActivityEvent) ❌
Go SDK core	`"feature_flag.evaluation"` ✅

OTel Hook: Cross-SDK inconsistencies and proposal for a shared test harness (like flagd-testbed) #373

Description

Background

Inconsistencies Found

1. Attribute Names — Out of Sync with Current OTel Semconv

2. Span / Log Event Name

3. Hook Lifecycle Stage for Emitting Telemetry

4. OTel Semconv Version Drift — Even Within a Single Repo

5. Metrics — No Formal Spec Coverage

Proposal: Shared OTel Hook Test Harness

Why This Matters

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions