Mobile App Insights: small slice (Mac Catalyst)#165
Conversation
- Merge wash-observability.md from decisions/inbox/ to decisions.md - Delete inbox file - Add orchestration log: 2026-04-19T22:36:52Z-wash.md - Add session log: 2026-04-19-azure-error-visibility.md - Append observability note to wash/history.md for cross-agent visibility Audit outcome: Container logs flow correctly; App Insights unconfigured; no global exception handler; AI endpoints silent on failures; /health unmapped. Wash proposes four-part fix (App Insights, exception middleware, AI endpoint logging, /health) awaiting Captain approval (~1 day). Immediate workaround provided: CLI tail + KQL query against law-3ovvqiybthkb6.
- Merge wash-mobile-observability.md from inbox → decisions.md - Update decisions.md with full mobile App Insights plan - Append cross-agent note to Kaylee history (Blazor JS error bridge opportunity) - Update Wash history with planning context Awaiting Captain decisions on: 1. One vs. two App Insights resources 2. Connection string embedding OK 3. App Store submission timeline (PrivacyInfo.xcprivacy) 4. Marketing site scope Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Orchestration log: 2026-04-20T02-42-17Z-wash.md * Wash answered Captain's 3 follow-up questions on mobile observability * Findings: ONE App Insights resource (shared MAUI+API), embed connection string, reject TinyInsights.Maui - Session log: 2026-04-20T02-42-17Z-mobile-appinsights-qa.md (brief summary) - Merged decision inbox → decisions.md (now 54.5KB) * wash-mobile-appinsights-answers.md merged with full QA rationale * Deleted inbox file after merge - Tasks 4-7: no-op (no cross-agent work, no archiving needed, history.md already summarized) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wires Azure Monitor OpenTelemetry exporter into the existing MAUI OTel
pipeline and subscribes to MauiExceptions.UnhandledException so unhandled
crashes land in Application Insights.
Changes:
- MauiServiceDefaults: add Azure.Monitor.OpenTelemetry.Exporter 1.7.0,
bump OpenTelemetry.Extensions.Hosting/Exporter.OTLP/Instrumentation.Http
to 1.15.x to satisfy the transitive floor. AddAzureMonitor{Log,Metric,
Trace}Exporter is gated on #if !DEBUG + a non-empty connection string so
simulator/dev runs stay silent. Sets a stable service name
SentenceStudio.Mobile.<Platform> so App Insights cloud_RoleName
identifies the client clearly.
- SentenceStudioAppBuilder.InitializeApp: one subscriber on
MauiExceptions.UnhandledException that critical-logs the exception and
best-effort ForceFlush's LoggerProvider/TracerProvider/MeterProvider
(3s budget) before the process dies.
- appsettings.Production.json: AzureMonitor:ConnectionString for the
sstudio-mobile-ai App Insights resource in rg-sstudio-prod.
- MacCatalyst/MauiProgram.cs: #if DEBUG guard the DevFlow usings so
Release builds don't fail on the debug-only package references
(pre-existing bug, unblocks Release validation).
OUT of scope (deferred to full plan): Blazor WebView JS bridge, Android
linker preserve, iOS linker preserve, PrivacyInfo.xcprivacy, Windows,
custom processors.
Validated on Mac Catalyst Release with a forced InvalidOperationException;
record appeared in App Insights within ~5 minutes with cloud_RoleName=
SentenceStudio.Mobile.MacCatalyst. Server-side companion still pending
so client spans will be orphan until the API also emits to App Insights.
Refs: .squad/decisions.md wash-mobile-observability, wash-mobile-appinsights-answers
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three issues flagged on the crash-handling hot path, all fixed:
1. (HIGH) DeviceInfo.Platform was read inside ConfigureOpenTelemetry while
the host builder was still configuring — pre-MauiApp.Build() timing is
unsafe for MAUI Essentials and could produce cloud_RoleName="Unknown".
Added platformName parameter to AddMauiServiceDefaults /
ConfigureOpenTelemetry; each platform head now passes the literal
("MacCatalyst" / "iOS" / "Android") from its MauiProgram.cs where
per-TFM context is unambiguous. Windows head doesn't call
AddMauiServiceDefaults today so no change needed there.
2. (MEDIUM) Serial ForceFlush across logger/tracer/meter at 3000ms each
risked a 9s worst case, exceeding the ~5-10s iOS watchdog on crash
paths. Replaced with three Task.Run calls (each with their own 2500ms
internal timeout) wrapped in a single Task.WaitAll with a 3000ms
outer deadline. Hard 3s wall-time ceiling regardless of per-provider
stalls. All exceptions swallowed.
3. (MEDIUM) MauiExceptions.UnhandledException subscription wasn't
idempotent — hot reload / re-init could double-wire the handler.
Gated behind Interlocked.Exchange on a static int flag so concurrent
init paths can't race, and repeated InitializeApp() calls won't
double-fire the crash handler.
Validation: clean Release build + SENTENCESTUDIO_CRASH_TEST=1 reproduced
the forced InvalidOperationException. App Insights confirmed arrival at
2026-04-21T20:22:36Z with cloud_RoleName=SentenceStudio.Mobile.MacCatalyst.
Also captured the parallel-bounded-flush pattern and Interlocked-gated
subscription in .squad/skills/maui-azure-monitor/SKILL.md.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review fixes applied (commit
|
* server-appinsights: wire Azure Monitor OpenTelemetry exporters into API (close mobile↔API correlation loop) Adds server-side Application Insights via Azure.Monitor.OpenTelemetry.Exporter into SentenceStudio.ServiceDefaults so API, WebApp, Workers, and Marketing all tag themselves with distinct cloud_RoleName values and export to the same sstudio-mobile-ai resource the mobile client already uses (PR #165). With HttpClient instrumentation on the client and AspNetCore instrumentation on the server, W3C traceparent propagates automatically and requests join to dependencies on operation_Id. Critical MAUI-safety pivot: Azure.Monitor.OpenTelemetry.AspNetCore transitively requires Microsoft.AspNetCore.App, which has no runtime pack for maccatalyst-*/ ios-*/android-* RIDs. ServiceDefaults is consumed by every MAUI head via AppLib, so the AspNetCore variant breaks MAUI builds (NETSDK1082). Using the lower-level Exporter package (same one mobile uses) with the three AddAzureMonitor{Log, Metric,Trace}Exporter calls in shared defaults keeps MAUI buildable; AspNetCore instrumentation is added only to the API csproj and wired from Program.cs. OTel stack bumped to 1.15.x across ServiceDefaults and AppLib to stay aligned with MauiServiceDefaults and clear NU1605 downgrade errors in web projects. Azure Monitor wiring gated #if !DEBUG so local aspire run keeps streaming to the Aspire dashboard via OTLP without dual-exporting. AzureMonitor:ConnectionString committed to Api/appsettings.Production.json (write-only key, same approach mobile slice used; ingestion capped at 0.5 GB/day). Companion skill at .squad/skills/aspnetcore-azure-monitor/SKILL.md documents the MAUI-safe server pattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * api: global unhandled-exception handler for AppInsights exceptions table ASP.NET Core OTel instrumentation tags exceptions on the request span but does NOT produce rows in App Insights' exceptions table — that's populated only from ILogger records carrying an Exception. Wire UseExceptionHandler as the first middleware, log via a named 'UnhandledException' ILogger, and write a ProblemDetails 500. The OTel log exporter ships the record so KQL like exceptions | where cloud_RoleName == 'SentenceStudio.Api' surfaces unhandled controller / minimal-API throws. Placement: before UseAuthentication so exceptions in auth handlers and custom middleware are also caught. Smoke-validated locally by temporarily adding /__debug/boom (removed before commit): HTTP 500 + application/problem+json body, fail: UnhandledException[0] log line with full stack, no process crash. Also: - aspnetcore-azure-monitor SKILL: replace 'no middleware needed' claim with the correct pattern; add az monitor app-insights component billing update recipe for daily-cap management. - wash history: learnings for the cap raise CLI and the span-vs- exceptions-row distinction. Companion: 0.5 GB/day → 2 GB/day cap raise on sstudio-mobile-ai (done via az CLI, read-back archived in PR #166 body). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tion PR #165 (mobile) + PR #166 (server) shipped the App Insights pipeline end to end, but the mobile↔API correlation join in App Insights was returning zero rows. Every server request had `operation_Id == operation_ParentId` — i.e., no `traceparent` header was arriving from the device. Diagnosis (see #171): - `OpenTelemetry.Instrumentation.Http`'s `AddHttpClientInstrumentation()` was already on the MAUI TracerProvider since commit 216a2da and a trim-disabled Release build on DX24 produced the same zero-span result, so neither registration nor trimming was the problem. - Mobile logs had empty `operation_Id` across the board, confirming no ambient `Activity` ever existed on the device. - Root cause (tracked in #171): MAUI's `MauiApp` doesn't run `IHostedService` instances, so the `TelemetryHostedService` that would normally materialize the TracerProvider and attach its listeners never runs. Logs work because they hook `ILoggerFactory` synchronously; the tracer path needs the hosted-service startup. This PR: - Adds `ApiActivityHandler`, a `DelegatingHandler` that starts a `Client` Activity per outbound API call using a dedicated `ActivitySource` (`SentenceStudio.Mobile.HttpClient`). With an Activity current, HttpClient's built-in `DiagnosticsHandler` auto-injects the W3C `traceparent` header. - Registers the new ActivitySource on the mobile TracerProvider via `.AddSource(...)` in `MauiServiceDefaults.Extensions` so the spans actually export. - Wires the handler onto every API-bound HttpClient: CoreSync's `HttpClientToServer`, the auth client, the four typed API clients, and `VersionCheckService`. The handler is placed FIRST in the chain so the span wraps the full operation including auth token attachment. - Hardens `OpenTelemetryInitializer` to call `GetRequiredService<T>()` instead of the nullable `GetService<T>()` for all three providers, so a misregistration fails loudly at startup instead of silently breaking telemetry at runtime. Out of scope (explicitly): - Root-cause fix for the IHostedService gap — tracked in #171. - The raw `new HttpClient()` in `SentenceStudio.Shared/Services/AiService.cs:93` — bypasses `HttpClientFactory` entirely. Separate refactor. - The KQL in `docs/deploy-runbook.md` is still wrong (joins requests to requests; should be dependencies to requests). Separate doc PR. Verification: Mac Catalyst Debug + Release both build clean. Post-merge verification will be an iOS publish to DX24 + KQL query for non-empty `operation_ParentId` on server requests.
…ile↔API correlation (#172) * fix(mobile): wrap API HttpClients with ApiActivityHandler for correlation PR #165 (mobile) + PR #166 (server) shipped the App Insights pipeline end to end, but the mobile↔API correlation join in App Insights was returning zero rows. Every server request had `operation_Id == operation_ParentId` — i.e., no `traceparent` header was arriving from the device. Diagnosis (see #171): - `OpenTelemetry.Instrumentation.Http`'s `AddHttpClientInstrumentation()` was already on the MAUI TracerProvider since commit 216a2da and a trim-disabled Release build on DX24 produced the same zero-span result, so neither registration nor trimming was the problem. - Mobile logs had empty `operation_Id` across the board, confirming no ambient `Activity` ever existed on the device. - Root cause (tracked in #171): MAUI's `MauiApp` doesn't run `IHostedService` instances, so the `TelemetryHostedService` that would normally materialize the TracerProvider and attach its listeners never runs. Logs work because they hook `ILoggerFactory` synchronously; the tracer path needs the hosted-service startup. This PR: - Adds `ApiActivityHandler`, a `DelegatingHandler` that starts a `Client` Activity per outbound API call using a dedicated `ActivitySource` (`SentenceStudio.Mobile.HttpClient`). With an Activity current, HttpClient's built-in `DiagnosticsHandler` auto-injects the W3C `traceparent` header. - Registers the new ActivitySource on the mobile TracerProvider via `.AddSource(...)` in `MauiServiceDefaults.Extensions` so the spans actually export. - Wires the handler onto every API-bound HttpClient: CoreSync's `HttpClientToServer`, the auth client, the four typed API clients, and `VersionCheckService`. The handler is placed FIRST in the chain so the span wraps the full operation including auth token attachment. - Hardens `OpenTelemetryInitializer` to call `GetRequiredService<T>()` instead of the nullable `GetService<T>()` for all three providers, so a misregistration fails loudly at startup instead of silently breaking telemetry at runtime. Out of scope (explicitly): - Root-cause fix for the IHostedService gap — tracked in #171. - The raw `new HttpClient()` in `SentenceStudio.Shared/Services/AiService.cs:93` — bypasses `HttpClientFactory` entirely. Separate refactor. - The KQL in `docs/deploy-runbook.md` is still wrong (joins requests to requests; should be dependencies to requests). Separate doc PR. Verification: Mac Catalyst Debug + Release both build clean. Post-merge verification will be an iOS publish to DX24 + KQL query for non-empty `operation_ParentId` on server requests. * fix(mobile): use Activity.AddException for OTel-conformant exception recording Code review feedback on #172: exceptions should be recorded as Activity events (via AddException/RecordException), not raw tags. Emits the standard OTel 'exception' event with type/message/stacktrace, which surfaces in App Insights' exception timeline rather than being tag-only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tion (#173) PR #172 got mobile HttpClient dependency spans emitting with operation_Id, but the correlation join against API requests still returned zero rows: the API saw every incoming request without a traceparent header and started a fresh operation_Id. Root cause: HttpClient's built-in DiagnosticsHandler only injects traceparent automatically when an OTel-style ActivityListener is attached to "System.Net.Http". On MAUI the listener never attaches because OpenTelemetry's TelemetryHostedService — which wires listeners to the TracerProvider — relies on IHostedService, and MauiApp doesn't run hosted services (issue #171). Fix: have ApiActivityHandler explicitly call DistributedContextPropagator.Current.Inject(...) on the outbound request headers after starting its Activity. Guards against double-injection if a caller or a resilience retry already set traceparent. This is the user-space workaround to #171. Framework fix is still desirable but now lower priority. Verification plan: re-run the App Insights correlation join; expect requests | join dependencies on operation_Id to return > 0 rows for the mobile role name. Refs: #165 #166 #172 #171
Ships the first-increment (~3 hour) mobile Application Insights slice per Captain's
path 1 goapproval. WiresAzure.Monitor.OpenTelemetry.Exporterinto the existing MAUI OTel pipeline, subscribes toMauiExceptions.UnhandledException, and embeds the connection string inappsettings.Production.json(write-only ingestion key — embedding is the documented pattern; per the decisions memowash-mobile-appinsights-answers).In scope
sstudio-mobile-aiinrg-sstudio-prod, workspace-linked tolaw-3ovvqiybthkb6, daily cap 0.5 GB.Azure.Monitor.OpenTelemetry.Exporter1.7.0 added toSentenceStudio.MauiServiceDefaults. BumpedOpenTelemetry.Extensions.Hosting/Exporter.OTLP/Instrumentation.Httpto 1.15.x to satisfy Azure Monitor's transitive floor.cloud_RoleNameset explicitly viaResourceBuilder.AddService("SentenceStudio.Mobile.<DeviceInfo.Platform>").SentenceStudioAppBuilder.InitializeAppsubscribes once toMauiExceptions.UnhandledException→ILogger.LogCritical→ best-effortForceFlush(3000)on all three OTel providers before the process dies.appsettings.Production.jsonupdated withAzureMonitor:ConnectionString.MacCatalyst/MauiProgram.cshad unguardedusing Microsoft.Maui.DevFlow.*even though those packages areCondition='$(Configuration)'=='Debug'. Wrapped the usings in#if DEBUG.Out of scope (deferred to full plan)
window.onerror,unhandledrejection)preservedirectives for Release link-time trimmingPrivacyInfo.xcprivacy(Captain confirmed no App Store submission planned)Validation
Built Release Mac Catalyst, launched with
SENTENCESTUDIO_CRASH_TEST=1, temp thread threwInvalidOperationException10s after launch. Waited 4 minutes. KQL query:Result (truncated):
The forced exception (
AppInsights pipeline validation…) is there, plus bonus evidence the pipeline is already catching caught-and-logged exceptions from startup code paths (EF NativeAOT model-build, HelpKit presenter).cloud_RoleNameis correct. Temp validation code reverted before commit.operation_Idis empty because we throw from a bareThread, not inside an OTel-instrumented activity. Real crashes inside an HttpClient span will populate it and correlate to the API once the server-side companion ships.Secret management
Committed the connection string to
appsettings.Production.json(which is already a tracked file containing service-discovery endpoints). Rationale, per.squad/decisions.mdwash-mobile-appinsights-answers:If Captain prefers an env-var / non-committed override path later, that is easy to add (the config system reads env vars already).
Follow-ups before full rollout
SentenceStudio.Apiso W3Ctraceparentcorrelates mobile → API spans.Resource
/subscriptions/a25bc5f2-e641-47b9-89a8-5e5fd428d9d6/resourceGroups/rg-sstudio-prod/providers/microsoft.insights/components/sstudio-mobile-ai74e94530-d17f-404a-8726-b7266724b70fCo-authored-by: Copilot 223556219+Copilot@users.noreply.github.com