[DRAFT] OMPE-88188: Default explicit temporal data instead of TAA#5232
[DRAFT] OMPE-88188: Default explicit temporal data instead of TAA#5232jmart-nv wants to merge 2 commits intoisaac-sim:developfrom
Conversation
RTX uses DLSS anti-aliasing by default. DLSS uses temporal frame blending that encodes motion information into a single RGB observation, causing camera-based RL to learn from temporal artifacts that don't exist with real cameras in the same way. This has been fixed by changing the default AA mode to FXAA, which is slower than DLSS, but more accurately simulates a real camera. Also added a runtime warning when DLSS, DLAA or TAA are enabled, as these all introduce temporal artifacts. Added new unit tests to verify expected behavior. Also tested with full cartpole-camera training run.
In the previous commit, DLSS anti-aliasing was disabled by default since it provides implicit temporal data that does not accurately reflect how real cameras work. However, newton's energy-conserving physics solver requires temporal velocity data in order to compensate for the lack of damping. This commit provides explicit temporal information via 2-frame stacking by default for newton-based camera tasks. This allows newton to provide the damping it needs to converge at the same rate as physx. This adds 36% GPU memory overhead, but the wall clock overhead is negligible. The default for physx is still stack size = 1 (disabled) since physx has implicit damping built-in via its TGS solver. Implementation: `TiledCameraCfg` now has a `frame_stack` field that controls a ring buffer of previous frames. These are concatenated to the channel dimension automatically in `DirectRLEnv`. The new `MultiBackendTiledCameraCfg` wraps the `MultiBackendRendererCfg` to provide defaults for the different physics presets. If newton physics is used without frame stacking, a runtime warning will now be emitted. Updated existing tasks to use the new `MultiBackendTiledCameraCfg` and added 12 new tests.
kellyguo11
left a comment
There was a problem hiding this comment.
Review Summary
Good two-part PR that addresses a real sim2real gap: (1) disabling temporal AA (DLSS) by default so RL policies don't learn from renderer-specific temporal artifacts, and (2) adding frame stacking for Newton to explicitly provide temporal information that Newton's energy-conserving integrator needs.
The implementation is solid — ring buffer approach for frame stacking is clean, the MultiBackendTiledCameraCfg preset pattern is a nice API improvement over the previous renderer_cfg=MultiBackendRendererCfg() approach, and the test coverage is thorough (12+ new tests covering frame stacking, ring buffer correctness, reset behavior, and AA mode warnings).
A few performance and correctness issues worth addressing below.
| self.renderer.write_output(self.render_data, name, single) | ||
| history = self._frame_history[name] | ||
| # For envs that just reset, fill all history slots with current frame | ||
| if needs_init.any(): |
There was a problem hiding this comment.
should-fix: needs_init.any() triggers a GPU→CPU synchronization on every update step, for every output channel. In the common case (no environments were just reset), this is a wasted sync that stalls the pipeline.
Consider tracking reset state with a CPU-side flag instead:
# In reset():
self._frame_stack_needs_init_cpu = True # simple bool
# Here:
if self._frame_stack_needs_init_cpu:
init_ids = self._frame_stack_needs_init.nonzero(as_tuple=False).squeeze(-1)
if len(init_ids) > 0:
for i in range(self._frame_stack_size):
history[i, init_ids] = single[init_ids]
self._frame_stack_needs_init.zero_()
self._frame_stack_needs_init_cpu = FalseThis avoids the GPU→CPU sync entirely in the steady-state path.
| ordered = torch.cat( | ||
| [ | ||
| history[(self._frame_stack_idx + 1 + i) % self._frame_stack_size] | ||
| for i in range(self._frame_stack_size) |
There was a problem hiding this comment.
nit (perf): The torch.cat with a list comprehension per update is fine for frame_stack=2 but scales as O(frame_stack) allocations per step. Since the API allows arbitrary frame_stack values, worth either:
- Documenting that values > 4 are not recommended for performance reasons, or
- Using pre-allocated views with
torch.narrow+ in-place copies instead ofcat
Not blocking for the current use case (frame_stack=2).
| c, | ||
| c * frame_stack, | ||
| frame_stack, | ||
| attr_name, |
There was a problem hiding this comment.
suggestion: This break means only the first camera with frame_stack > 1 adjusts the observation space. For single-camera tasks this is correct, but it silently ignores additional cameras.
Worth adding a comment like # NOTE: Only the first camera with frame_stack > 1 is used to adjust observation_space. Multi-camera tasks should set observation_space explicitly.
Also: the observation_space auto-adjust assumes the space is [H, W, C] from a single camera. If a task uses a different observation layout, the multiplication could corrupt the space dimensions. A guard like checking that c matches the camera's expected channel count would make this more robust.
| visualizer_intent = _compute_visualizer_intent(env_cfg) | ||
| _set_visualizer_intent_on_launcher_args(launcher_args, visualizer_intent) | ||
|
|
||
| # Warn when Newton physics is used with camera observations but no frame stacking. |
There was a problem hiding this comment.
should-fix: _has_camera_without_frame_stack checks isinstance(node, CameraCfg) but frame_stack is only defined on TiledCameraCfg. A regular CameraCfg (non-tiled) will hit getattr(node, "frame_stack", 1) <= 1 → True, causing a false-positive warning even though frame stacking isn't applicable to non-tiled cameras.
Should be:
from isaaclab.sensors import TiledCameraCfg
def _has_camera_without_frame_stack(node) -> bool:
if not isinstance(node, TiledCameraCfg):
return False
return getattr(node, "frame_stack", 1) <= 1| image quality. | ||
|
|
||
| This is set by the variable: ``/rtx/post/dlss/execMode``. | ||
| antialiasing_mode: Literal["Off", "FXAA", "DLSS", "TAA", "DLAA"] | None = "FXAA" |
There was a problem hiding this comment.
suggestion (breaking change): Changing the default from None (preserve renderer default, which was DLSS) to "FXAA" is a user-facing behavioral change — existing code that relied on the previous default will now get different rendering output and potentially different performance characteristics.
This is the right default for RL training, but it should be called out in the changelog since it affects all users, not just Newton users. The PR checklist has the changelog item unchecked — please add an entry noting the default AA mode change.
| # Step to build history | ||
| for _ in range(3): | ||
| sim.step() | ||
| camera.update(dt=0.01) |
There was a problem hiding this comment.
nit: import logging has no dependency on the simulator and can be moved to the top-level imports (before AppLauncher). Only isaaclab_tasks and pxr need to be deferred.
| import logging | ||
|
|
||
| import isaaclab_tasks # noqa: F401 | ||
| from isaaclab_tasks.utils import resolve_task_config |
There was a problem hiding this comment.
suggestion: Manipulating sys.argv directly is fragile — if tests run in parallel, another test could see the modified argv. Consider using unittest.mock.patch('sys.argv', [...]) as a context manager, which is both cleaner and thread-safe:
from unittest.mock import patch
with patch('sys.argv', [sys.argv[0], 'presets=newton']):
env_cfg, _ = resolve_task_config(...)| provides this temporal information by concatenating consecutive frames along | ||
| the channel dimension, enabling the policy to infer velocity from pixel | ||
| differences between frames. | ||
| """ |
There was a problem hiding this comment.
suggestion: Nice API — MultiBackendTiledCameraCfg as a TiledCameraCfg subclass with embedded MultiBackendRendererCfg is cleaner than the previous pattern of separate renderer_cfg fields.
One thing to document: downstream code using type(cam) == TiledCameraCfg (exact type check) would now fail. isinstance checks still work. Might be worth a migration note in the PR description or changelog.
Description
This is a two-part fix: disabling temporal AA (DLSS) by default, and enabling frame stacking for newton by default.
Temporal AA (DLSS): Fixes incorrectly inflated convergence on newton+rtx.
RTX uses DLSS anti-aliasing by default. DLSS uses temporal frame blending that encodes motion information into a single RGB observation, causing camera-based RL to learn from temporal artifacts that don't exist with real cameras in the same way.
This has been fixed by changing the default AA mode to FXAA, which is slower than DLSS, but more accurately simulates a real camera. Also added a runtime warning when DLSS, DLAA or TAA are enabled, as these all introduce temporal artifacts.
Added new unit tests to verify expected behavior. Also tested with full cartpole-camera training run.
Newton Frame Stacking: Provides explicit temporal data to newton.
In the previous commit, DLSS anti-aliasing was disabled by default since it provides implicit temporal data that does not accurately reflect how real cameras work. However, newton's energy-conserving physics solver requires temporal velocity data in order to compensate for the lack of damping.
This commit provides explicit temporal information via 2-frame stacking by default for newton-based camera tasks. This allows newton to provide the damping it needs to converge at the same rate as physx. This adds 36% GPU memory overhead, but the wall clock overhead is negligible.
The default for physx is still stack size = 1 (disabled) since physx has implicit damping built-in via its TGS solver.
Implementation:
TiledCameraCfg now has a frame_stack field that controls a ring buffer of previous frames. These are concatenated to the channel dimension automatically in DirectRLEnv. The new MultiBackendTiledCameraCfg wraps the MultiBackendRendererCfg to provide defaults for the different physics presets. If newton physics is used without frame stacking, a runtime warning will now be emitted.
Updated existing tasks to use the new MultiBackendTiledCameraCfg and added 12 new tests.
Type of change
Screenshots
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there