RUM-14471: Auto-detect interstitial activities and forward TTID#3184
RUM-14471: Auto-detect interstitial activities and forward TTID#3184hamorillo wants to merge 18 commits intohector.morilloprieto/RUM-14471from
Conversation
d848ef2 to
7b69d29
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b69d29ae6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
features/dd-sdk-android-rum/src/main/kotlin/com/datadog/android/rum/internal/RumFeature.kt
Outdated
Show resolved
Hide resolved
features/dd-sdk-android-rum/src/main/kotlin/com/datadog/android/rum/internal/RumFeature.kt
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
7b69d29 to
060f6f3
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## hector.morilloprieto/RUM-14471 #3184 +/- ##
==================================================================
+ Coverage 71.30% 71.33% +0.03%
==================================================================
Files 930 930
Lines 34455 34497 +42
Branches 5814 5829 +15
==================================================================
+ Hits 24566 24606 +40
+ Misses 8248 8244 -4
- Partials 1641 1647 +6
🚀 New features to boost your workflow:
|
eeb86be to
a9af87b
Compare
6550faf to
126249f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 126249f396
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pendingStartupScenario = forwarded | ||
| listener.onAppStartupRetargeted(forwarded) | ||
| } else { | ||
| pendingStartupScenario = null |
There was a problem hiding this comment.
Preserve pending startup when no retarget candidate exists yet
Clearing pendingStartupScenario as soon as the startup activity is destroyed drops the original startup context when the replacement activity is created later (for example, navigation posted to the main thread or triggered by an async callback after finish()). In that flow, the next tracked activity is treated as a new warm start, so TTID is no longer measured from the original app-start time and the forwarding feature silently under-reports launch time.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
I think this is probably a valid issue. Suppose we have AuthActivity that starts MainActivity in AuthActivity.onCreate. I don't know if there is any guarantee that MainActivity.onCreate will be called before AuthActivity.onDestroy, especially in the case of using handler.post as described in the comment.
I wonder if we could do something like this:
- When we detect the launch we remember the
RumStartupScenarioaspendingScenarioinsideRumAppStartupDetectorImpl. We calllistener.onAppStartupDetected(scenario)and subscribe to first frame. - If another
Activityis created while we havependingScenario, we call some new callbacklistener.onNextActivityCreated(pendingScenario, nextActivity)and there subscribe to its first frame. - In
onFirstFrameDrawnwe check thatRumAppStartupDetectorstill has a pending scenario (maybe we need a getterRumAppStartupDetector.getPendingScenario) and if this is the same scenario that when we subscribed, then we callrumMonitor.sendTTIDEventand also clean the pending scenario using some new methodRumAppStartupDetector.clearPendingScenario. - The
ActivityinsideRumStartupScenariomight be different from theActivitythat actually drew the frame, so we can create a field inRumTTIDInfocalled let's sayfirstFrameDrawActivity: WeakReference<Activity>. So probablyforwardTowill not be needed anymore.
These aren't the hard requirements, just some thoughts from me that you might find useful. And I might be wrong somewhere, didn't extensively check.
There was a problem hiding this comment.
In the current code, the else branch no longer clears pendingStartupScenario. Instead, it records awaitingRetargetSinceNs = timeProvider().nanoTime, preserving the startup scenario for deferred retarget. When the next tracked activity is created (in onBeforeActivityCreated), it checks whether the elapsed time is within a 1-second timeout and, if so, forwards the preserved scenario via onAppStartupRetargeted. If the timeout expires, the stale scenario is discarded and normal startup detection takes over. This handles the async interstitial pattern (e.g., startActivity() + finish() posted via Handler.post) where the replacement activity's onCreate fires after the interstitial's onDestroy.
I chose this approach instead of subscribing to first frame for every new activity while a pending scenario exists because I think it's simpler and less risky — it avoids increasing coupling between RumAppStartupDetectorImpl and the listener, and eliminates the possibility of duplicate TTID events if multiple activities draw their first frame while the scenario is still pending. That said, happy to discuss if you think the alternative would be better.
There was a problem hiding this comment.
When the next tracked activity is created (in onBeforeActivityCreated), it checks whether the elapsed time is within a 1-second timeout
IMHO if these kinds of time based heuristics can be avoided (another solution that works in more cases exists), they should be avoided. What if in AuthActivity.onCreate we have postDelayed(2.seconds, { startActivity(MainActivity) })?
it avoids increasing coupling between RumAppStartupDetectorImpl and the listener
I don't understand why coupling (if I understood correctly what you meant) here is a bad thing. Also you already introduced notifyStartupTTIDReported, I don't see how it is different from my approach in terms of coupling.
and eliminates the possibility of duplicate TTID events if multiple activities draw their first frame while the scenario is still pending
Part 3 in my comment takes care of this case.
There was a problem hiding this comment.
Those are fair points. In fact, the time-based solution is what I especially don’t like about my proposed approach.
I’ve probably been too focused on my previous idea. Let me dig deeper into your initial suggestion, as it clearly has some benefits if we can come up with a good implementation.
I’ll pause this PR while I explore the other approach.
Thanks again for your insight, @aleksandr-gringauz!
| pendingStartupScenario = forwarded | ||
| listener.onAppStartupRetargeted(forwarded) | ||
| } else { | ||
| pendingStartupScenario = null |
There was a problem hiding this comment.
I think this is probably a valid issue. Suppose we have AuthActivity that starts MainActivity in AuthActivity.onCreate. I don't know if there is any guarantee that MainActivity.onCreate will be called before AuthActivity.onDestroy, especially in the case of using handler.post as described in the comment.
I wonder if we could do something like this:
- When we detect the launch we remember the
RumStartupScenarioaspendingScenarioinsideRumAppStartupDetectorImpl. We calllistener.onAppStartupDetected(scenario)and subscribe to first frame. - If another
Activityis created while we havependingScenario, we call some new callbacklistener.onNextActivityCreated(pendingScenario, nextActivity)and there subscribe to its first frame. - In
onFirstFrameDrawnwe check thatRumAppStartupDetectorstill has a pending scenario (maybe we need a getterRumAppStartupDetector.getPendingScenario) and if this is the same scenario that when we subscribed, then we callrumMonitor.sendTTIDEventand also clean the pending scenario using some new methodRumAppStartupDetector.clearPendingScenario. - The
ActivityinsideRumStartupScenariomight be different from theActivitythat actually drew the frame, so we can create a field inRumTTIDInfocalled let's sayfirstFrameDrawActivity: WeakReference<Activity>. So probablyforwardTowill not be needed anymore.
These aren't the hard requirements, just some thoughts from me that you might find useful. And I might be wrong somewhere, didn't extensively check.
| activity = activity, | ||
| callback = callback | ||
| ) | ||
| if (wasForwarded && activity.window.peekDecorView()?.isLaidOut == true) { |
There was a problem hiding this comment.
If the first frame is already drawn, it means that we didn't manage to properly capture the time when it happend. How can it be that here the first frame is already drawn?
Introduce AppStartupActivityPredicate to allow customers to exclude interstitial Activities (e.g., auth screens) that launch another Activity in onCreate() and finish themselves. This fixes TTID not being reported when such Activities never draw a frame.
Use WeakHashMap to cache shouldTrackStartup() result at activity creation and reuse it at destruction, ensuring symmetric counter increment/decrement. This prevents counter corruption when predicates depend on mutable state (e.g., activity flags, runtime toggles) that may change between lifecycle callbacks.
…interstitial activities from TTID measurement
Problem: When running multiple startup tests together, Datadog.stopInstance() in afterActivityFinished() triggered async cleanup operations that didn't complete before the next test initialized. This caused RumSessionScopeStartupManager state (ttidSentForSession = true) to leak between tests, preventing TTID from being reported in subsequent tests. Solution: Added @after tearDown() method in RumTest base class that: 1. Calls Datadog.stopInstance() to trigger cleanup 2. Calls InstrumentationRegistry.getInstrumentation().waitForIdleSync() to ensure main thread becomes idle before proceeding The waitForIdleSync() approach is superior to sleep() because it waits for actual completion rather than an arbitrary timeout. This ensures: - All pending UI operations complete - Main thread work finishes - Async cleanup operations have time to complete - Proper test isolation The double-stop (once in @after, once in afterActivityFinished) is safe due to Datadog.stopInstance() being idempotent. Verified: Both AppStartupActivityPredicateTest and AppStartupMainActivityDirectTest pass reliably when run together.
Added viewArguments parameter to ExpectedVitalAppLaunchEvent to allow verifying view attributes in vital app launch events. Updated startup tests to verify that TTID and TTFD events originate from MainContentActivity by checking the view name attribute using type-safe class references.
…king behavior and use cases
…rImpl and add safe methods to "detetk_custom_safe_call.yml"
a9af87b to
5538b89
Compare
When an interstitial activity (e.g., a splash screen) calls startActivity() + finish() in onCreate() without ever calling setContentView(), the OnDrawListener never fires and TTID is never reported. This change adds automatic detection of this pattern: when the startup activity is destroyed without TTID being reported and another tracked activity exists, the TTID measurement is "retargeted" to that next activity while preserving the original startup initialTime. The core mechanism lives in RumAppStartupDetectorImpl, which tracks a pendingStartupScenario and a startupTTIDReported flag. On activity destruction, if the startup activity was destroyed before drawing, the detector looks for the next tracked activity and calls the new onAppStartupRetargeted listener callback (which re-subscribes to the first-frame draw without re-sending AppStartEvent). This supports chained interstitials (A -> B -> C) where multiple non-drawing activities forward in sequence. For forwarded activities that have already drawn their first frame, the callback is invoked directly instead of subscribing to OnDrawListener. RumStartupScenario subclasses are now data classes with a forwardTo() method that creates a copy with the new activity reference while preserving all timing fields. RumTTIDInfo gains a wasForwarded field reported in startup telemetry. Integration tests cover three scenarios: auto-forwarding with a single non-drawing interstitial, chained auto-forwarding through two non-drawing activities, and the predicate-based exclusion of a drawing splash activity. The AppStartupActivityPredicateTest was updated to use a DrawingSplashActivity that actually renders content, properly testing the predicate for activities that draw but should be excluded from TTID.
…y has drawn Check isLaidOut on the decor view rather than just its existence when deciding whether to report TTID immediately for forwarded activities. peekDecorView() becomes non-null right after setContentView(), before the first frame is rendered, which causes TTID to be under-reported. isLaidOut ensures the view has completed at least one layout pass, which is a much more reliable signal that the activity has actually rendered.
The previous WeakHashMap-backed set had non-deterministic iteration order, which meant that when retargeting TTID after an interstitial activity was destroyed, firstOrNull() could pick an arbitrary activity if multiple tracked activities were alive simultaneously. Replace it with a MutableList<WeakReference<Activity>> to guarantee insertion-order iteration while preserving weak-reference semantics. Stale references are cleaned up on every onActivityDestroyed call. Unlike WeakHashMap, entries whose referents are GC'd won't be removed automatically. This is acceptable because onActivityDestroyed always fires for every activity, and the cleanup there also prunes any stale references, so the list never accumulates orphaned entries in practice. The size check also changes from O(1) to O(n) since we now use count() instead of size. This is negligible in practice because the number of concurrently tracked startup activities is extremely small (typically 1-2 during app launch).
…transitions When an interstitial activity calls startActivity() + finish() asynchronously (e.g., via Handler.post), the next activity's onCreate may fire after the interstitial's onDestroy. The previous code cleared pendingStartupScenario in onDestroy when no next candidate existed, losing the startup context. The replacement activity was then treated as a new WarmAfterActivityDestroyed instead of a forwarded cold start. Fix: instead of clearing pendingStartupScenario, record the destroy timestamp (awaitingRetargetSinceNs). When the next tracked activity is created within a 1-second timeout, forward the preserved scenario via onAppStartupRetargeted. If the timeout expires, discard the stale scenario and fall through to normal startup detection. This approach was chosen over the alternative of subscribing to first frame for every new activity while a pending scenario exists (with a new onNextActivityCreated callback). That design would increase coupling between RumAppStartupDetectorImpl and the listener — the detector would need to know about frame subscriptions, and the listener would need to coordinate multiple concurrent subscriptions. It also raises the risk of sending duplicate TTID events if multiple activities draw their first frame while the scenario is still pending. By keeping the explicit single-target forwarding model, only one activity is ever subscribed at a time, eliminating that class of bugs.
7fc43d9 to
3783e55
Compare
…akiness Replace the isLaidOut branch in subscribeToFirstFrame with a single code path: always subscribe via rumFirstDrawTimeReporter, then invalidate the decor view for forwarded activities to ensure the OnDrawListener fires. The previous approach called onFirstFrameDrawn() directly with the current timestamp when isLaidOut was true, which reported an inaccurate TTID duration that included retarget overhead rather than the actual first draw time.
1709515 to
6cba3cf
Compare
| callback = callback | ||
| ) | ||
| if (wasForwarded) { | ||
| // The forwarded activity may have already drawn before retarget. |
There was a problem hiding this comment.
How is it possible that the frame is already drawn when we are here?
There was a problem hiding this comment.
Let’s imagine the following case:
A.onCreate → A.onStart → A.onResume → B.onCreate → B.onStart → B.onResume → ... → A.onDestroy
A doesn’t draw any frames, and it doesn’t finish until B has already been drawn. When A is destroyed, we move forward with the scenario so that B calls subscribeToFirstFrame, but it has already drawn its first frame.
I discovered this case while implementing the instrumented test, so I would say it’s a possible scenario.
| pendingStartupScenario = forwarded | ||
| listener.onAppStartupRetargeted(forwarded) | ||
| } else { | ||
| pendingStartupScenario = null |
There was a problem hiding this comment.
When the next tracked activity is created (in onBeforeActivityCreated), it checks whether the elapsed time is within a 1-second timeout
IMHO if these kinds of time based heuristics can be avoided (another solution that works in more cases exists), they should be avoided. What if in AuthActivity.onCreate we have postDelayed(2.seconds, { startActivity(MainActivity) })?
it avoids increasing coupling between RumAppStartupDetectorImpl and the listener
I don't understand why coupling (if I understood correctly what you meant) here is a bad thing. Also you already introduced notifyStartupTTIDReported, I don't see how it is different from my approach in terms of coupling.
and eliminates the possibility of duplicate TTID events if multiple activities draw their first frame while the scenario is still pending
Part 3 in my comment takes care of this case.
5538b89 to
faf2b5e
Compare
What does this PR do?
Note: This PR is targeting hector.morilloprieto/RUM-14471, as it is an iteration of it.
Automatically detects when the startup activity is destroyed without drawing a frame (e.g., splash screens that call
startActivity()+finish()inonCreate()) and forwards the TTID measurement to the next tracked activity, preserving the original startupinitialTime. Supports chained interstitials (A → B → C).Core changes:
RumAppStartupDetectorImpltracks apendingStartupScenarioand on activity destruction checks whether TTID was reported. If not, retargets to the next tracked activity via a newonAppStartupRetargetedlistener callback.RumFeaturere-subscribes to the first-frame draw on retarget without re-sendingAppStartEvent. For forwarded activities that already drew, the callback is invoked directly.RumStartupScenariosubclasses converted to data classes withforwardTo()method.RumTTIDInfo.wasForwardedreported in startup telemetry.Motivation
When an interstitial activity calls
startActivity()+finish()withoutsetContentView(), theOnDrawListenernever fires and TTID is never reported. The predicate-based solution (base branch) requires customer configuration. This adds a complementary "just works" solution that handles the common case automatically.How to test
Apply the attached patch to the branch:
The patch modifies the sample app to add a
SplashActivityas the launcher activity. This activity immediately callsstartActivity(NavActivity)+finish()without setting a content view, simulating an interstitial splash screen. It also sets telemetry sampling to 100% and reduces session inactivity timeout to 15 seconds for easier testing.Run the sample app and check the RUM dashboard — you should see TTID reported for
NavActivitywithwas_forwarded: truein the startup telemetry metric.Additional Notes
setAppStartupActivityPredicate) is still needed for activities that DO draw but should be excluded from TTID (e.g., a branded splash with a logo). Both mechanisms coexist.AppStartupActivityPredicateTestto use a newDrawingSplashActivitythat renders content, properly testing the predicate use case now that auto-forwarding handles non-drawing interstitials.Review checklist (to be filled by reviewers)