perf(engine): adaptive subservice cadence + phase-stagger to cut CPU usage & spikes#92
Merged
Merged
Conversation
…spikes
Add _quality_scale()/_effective_{face,reid,pose}_interval() so face/ReID/pose
cadences scale with the active quality tier (high→1.0×, balanced→1.25×,
low→2.0×), matching the existing detection throttle. Add _seed_subservice_phases()
called on the first inference-loop tick to stagger ReID (−0.5×period) and pose
(−0.33×period) off face so heavy appearance models stop coinciding every tick.
24 new tests in tests/test_cpu_governor.py; 103 total pass; ruff clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
8a158dc to
d15bff8
Compare
Merged
TCVinNYC
added a commit
that referenced
this pull request
Jun 24, 2026
Second **pre-release** of the 2.2.0 reliability + performance roadmap, for real-hardware validation before the stable 2.2.0. Bumps `__version__` to `2.2.0-rc2`; on merge, tag `v2.2.0-rc2` triggers the pre-release installer build. **New since rc1 (all merged, CI-green + multi-agent-reviewed):** - #92 CPU subservice governor — fewer CPU spikes (phase-stagger) + cadence throttle under load - #93 preview-rate-cap (~20fps) - #90 live GPU-acceleration verdict in the Services panel - #91 OpenVINO auto-install for Intel - #94 scene-adaptive ReID threshold - #95/#96 opt-in fused TargetAssociator (off by default) - #97 batched detect_batch primitive **Validate especially:** CPU usage/spikes with all subservices running (your priority), and \`python -m autoptz --bench\` / the Services-panel verdict on Intel-Mac+AMD. To try the new tracking logic, set \`tracking.use_target_associator = true\`. Follow-ups after your validation: P4 coalescing scheduler, wire the associator ReID/pose cues + flip its default, int8 CPU quant, stable Win/Linux device binding. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Targets CPU usage and spikes when all subservices (detection, tracking, ReID, pose, face) run together.
Spikes: face and ReID both used a 0.25s interval seeded from 0, so they fired on the same appearance-thread ticks every 0.25s — a recurring double-model spike. A one-time
_seed_subservice_phases(now)now offsets ReID by half a period and pose by a third, so the heavy appearance models alternate instead of coinciding.Usage under load: the adaptive quality policy previously throttled only detection. Now face/ReID/pose cadence also scales with the cached
_quality_activestate via_quality_scale()(high 1.0x, balanced 1.25x, low 2.0x) and_effective_{face,reid,pose}_interval()— so the appearance subservices run less often when the machine is over budget, cutting sustained CPU. Subservice correctness is unchanged; only cadence/phase. Comparison semantics at all three gates preserved.25 new tests (scale per quality tier, effective intervals, stagger offsets + staggered due-times, high-quality regression). No real threads.
CPU-perf focus item. Real CPU savings (lower peak + lower avg under load) to be validated on the camera rig at the final RC.
🤖 Generated with Claude Code