Skip to content

OR-004 Load-Aware Routing #4

@ymote

Description

@ymote

OR-004 Load-Aware Routing

Objective

Extend routing beyond the baseline least-loaded policy.

Scope

  • Use running request count, queue depth, waiting token estimate, KV utilization, health, and decode throughput estimate.
  • Respect model and capability constraints.
  • Respect worker drain state.
  • Return primary and alternative targets.

Acceptance Criteria

  • Tests cover model mismatch, capability mismatch, unhealthy workers, full KV capacity, and worker drain.
  • Routing decisions include a reason string.
  • Policy produces deterministic output for a fixed worker snapshot.

Agent Notes

This is hardware agnostic. Runtime-specific scheduling stays in ominix-runtime.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions