Skip to content

[DRAFT] OMPE-88188: Default explicit temporal data instead of TAA#5232

Draft
jmart-nv wants to merge 2 commits intoisaac-sim:developfrom
jmart-nv:jmart/temporal-aa
Draft

[DRAFT] OMPE-88188: Default explicit temporal data instead of TAA#5232
jmart-nv wants to merge 2 commits intoisaac-sim:developfrom
jmart-nv:jmart/temporal-aa

Conversation

@jmart-nv
Copy link
Copy Markdown

Description

This is a two-part fix: disabling temporal AA (DLSS) by default, and enabling frame stacking for newton by default.

Temporal AA (DLSS): Fixes incorrectly inflated convergence on newton+rtx.

RTX uses DLSS anti-aliasing by default. DLSS uses temporal frame blending that encodes motion information into a single RGB observation, causing camera-based RL to learn from temporal artifacts that don't exist with real cameras in the same way.

This has been fixed by changing the default AA mode to FXAA, which is slower than DLSS, but more accurately simulates a real camera. Also added a runtime warning when DLSS, DLAA or TAA are enabled, as these all introduce temporal artifacts.

Added new unit tests to verify expected behavior. Also tested with full cartpole-camera training run.

Newton Frame Stacking: Provides explicit temporal data to newton.

In the previous commit, DLSS anti-aliasing was disabled by default since it provides implicit temporal data that does not accurately reflect how real cameras work. However, newton's energy-conserving physics solver requires temporal velocity data in order to compensate for the lack of damping.

This commit provides explicit temporal information via 2-frame stacking by default for newton-based camera tasks. This allows newton to provide the damping it needs to converge at the same rate as physx. This adds 36% GPU memory overhead, but the wall clock overhead is negligible.

The default for physx is still stack size = 1 (disabled) since physx has implicit damping built-in via its TGS solver.

Implementation:

TiledCameraCfg now has a frame_stack field that controls a ring buffer of previous frames. These are concatenated to the channel dimension automatically in DirectRLEnv. The new MultiBackendTiledCameraCfg wraps the MultiBackendRendererCfg to provide defaults for the different physics presets. If newton physics is used without frame stacking, a runtime warning will now be emitted.

Updated existing tasks to use the new MultiBackendTiledCameraCfg and added 12 new tests.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

profiling_charts convergence_final frame_comparison_grid

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

RTX uses DLSS anti-aliasing by default. DLSS uses temporal frame blending that encodes motion information into a single RGB observation, causing camera-based RL to learn from temporal artifacts that don't exist with real cameras in the same way.

This has been fixed by changing the default AA mode to FXAA, which is slower than DLSS, but more accurately simulates a real camera. Also added a runtime warning when DLSS, DLAA or TAA are enabled, as these all introduce temporal artifacts.

Added new unit tests to verify expected behavior. Also tested with full cartpole-camera training run.
In the previous commit, DLSS anti-aliasing was disabled by default since it provides implicit temporal data that does not accurately reflect how real cameras work. However, newton's energy-conserving physics solver requires temporal velocity data in order to compensate for the lack of damping.

This commit provides explicit temporal information via 2-frame stacking by default for newton-based camera tasks. This allows newton to provide the damping it needs to converge at the same rate as physx. This adds 36% GPU memory overhead, but the wall clock overhead is negligible.

The default for physx is still stack size = 1 (disabled) since physx has implicit damping built-in via its TGS solver.

Implementation:

`TiledCameraCfg` now has a `frame_stack` field that controls a ring buffer of previous frames. These are concatenated to the channel dimension automatically in `DirectRLEnv`. The new `MultiBackendTiledCameraCfg` wraps the `MultiBackendRendererCfg` to provide defaults for the different physics presets. If newton physics is used without frame stacking, a runtime warning will now be emitted.

Updated existing tasks to use the new `MultiBackendTiledCameraCfg` and added 12 new tests.
@github-actions github-actions bot added the isaac-lab Related to Isaac Lab team label Apr 10, 2026
Copy link
Copy Markdown
Contributor

@kellyguo11 kellyguo11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Good two-part PR that addresses a real sim2real gap: (1) disabling temporal AA (DLSS) by default so RL policies don't learn from renderer-specific temporal artifacts, and (2) adding frame stacking for Newton to explicitly provide temporal information that Newton's energy-conserving integrator needs.

The implementation is solid — ring buffer approach for frame stacking is clean, the MultiBackendTiledCameraCfg preset pattern is a nice API improvement over the previous renderer_cfg=MultiBackendRendererCfg() approach, and the test coverage is thorough (12+ new tests covering frame stacking, ring buffer correctness, reset behavior, and AA mode warnings).

A few performance and correctness issues worth addressing below.

self.renderer.write_output(self.render_data, name, single)
history = self._frame_history[name]
# For envs that just reset, fill all history slots with current frame
if needs_init.any():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should-fix: needs_init.any() triggers a GPU→CPU synchronization on every update step, for every output channel. In the common case (no environments were just reset), this is a wasted sync that stalls the pipeline.

Consider tracking reset state with a CPU-side flag instead:

# In reset():
self._frame_stack_needs_init_cpu = True  # simple bool

# Here:
if self._frame_stack_needs_init_cpu:
    init_ids = self._frame_stack_needs_init.nonzero(as_tuple=False).squeeze(-1)
    if len(init_ids) > 0:
        for i in range(self._frame_stack_size):
            history[i, init_ids] = single[init_ids]
    self._frame_stack_needs_init.zero_()
    self._frame_stack_needs_init_cpu = False

This avoids the GPU→CPU sync entirely in the steady-state path.

ordered = torch.cat(
[
history[(self._frame_stack_idx + 1 + i) % self._frame_stack_size]
for i in range(self._frame_stack_size)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (perf): The torch.cat with a list comprehension per update is fine for frame_stack=2 but scales as O(frame_stack) allocations per step. Since the API allows arbitrary frame_stack values, worth either:

  • Documenting that values > 4 are not recommended for performance reasons, or
  • Using pre-allocated views with torch.narrow + in-place copies instead of cat

Not blocking for the current use case (frame_stack=2).

c,
c * frame_stack,
frame_stack,
attr_name,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This break means only the first camera with frame_stack > 1 adjusts the observation space. For single-camera tasks this is correct, but it silently ignores additional cameras.

Worth adding a comment like # NOTE: Only the first camera with frame_stack > 1 is used to adjust observation_space. Multi-camera tasks should set observation_space explicitly.

Also: the observation_space auto-adjust assumes the space is [H, W, C] from a single camera. If a task uses a different observation layout, the multiplication could corrupt the space dimensions. A guard like checking that c matches the camera's expected channel count would make this more robust.

visualizer_intent = _compute_visualizer_intent(env_cfg)
_set_visualizer_intent_on_launcher_args(launcher_args, visualizer_intent)

# Warn when Newton physics is used with camera observations but no frame stacking.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should-fix: _has_camera_without_frame_stack checks isinstance(node, CameraCfg) but frame_stack is only defined on TiledCameraCfg. A regular CameraCfg (non-tiled) will hit getattr(node, "frame_stack", 1) <= 1True, causing a false-positive warning even though frame stacking isn't applicable to non-tiled cameras.

Should be:

from isaaclab.sensors import TiledCameraCfg

def _has_camera_without_frame_stack(node) -> bool:
    if not isinstance(node, TiledCameraCfg):
        return False
    return getattr(node, "frame_stack", 1) <= 1

image quality.

This is set by the variable: ``/rtx/post/dlss/execMode``.
antialiasing_mode: Literal["Off", "FXAA", "DLSS", "TAA", "DLAA"] | None = "FXAA"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (breaking change): Changing the default from None (preserve renderer default, which was DLSS) to "FXAA" is a user-facing behavioral change — existing code that relied on the previous default will now get different rendering output and potentially different performance characteristics.

This is the right default for RL training, but it should be called out in the changelog since it affects all users, not just Newton users. The PR checklist has the changelog item unchecked — please add an entry noting the default AA mode change.

# Step to build history
for _ in range(3):
sim.step()
camera.update(dt=0.01)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: import logging has no dependency on the simulator and can be moved to the top-level imports (before AppLauncher). Only isaaclab_tasks and pxr need to be deferred.

import logging

import isaaclab_tasks # noqa: F401
from isaaclab_tasks.utils import resolve_task_config
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Manipulating sys.argv directly is fragile — if tests run in parallel, another test could see the modified argv. Consider using unittest.mock.patch('sys.argv', [...]) as a context manager, which is both cleaner and thread-safe:

from unittest.mock import patch

with patch('sys.argv', [sys.argv[0], 'presets=newton']):
    env_cfg, _ = resolve_task_config(...)

provides this temporal information by concatenating consecutive frames along
the channel dimension, enabling the policy to infer velocity from pixel
differences between frames.
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Nice API — MultiBackendTiledCameraCfg as a TiledCameraCfg subclass with embedded MultiBackendRendererCfg is cleaner than the previous pattern of separate renderer_cfg fields.

One thing to document: downstream code using type(cam) == TiledCameraCfg (exact type check) would now fail. isinstance checks still work. Might be worth a migration note in the PR description or changelog.

@fatimaanes fatimaanes self-requested a review April 13, 2026 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants