Skip to content

Properly increment metrics for /v1/infer#1236

Merged
tisnik merged 3 commits intolightspeed-core:mainfrom
samdoran:rlsapi-metrics
Mar 2, 2026
Merged

Properly increment metrics for /v1/infer#1236
tisnik merged 3 commits intolightspeed-core:mainfrom
samdoran:rlsapi-metrics

Conversation

@samdoran
Copy link
Contributor

@samdoran samdoran commented Feb 27, 2026

Description

In order to properly record metrics for the /v1/infer endpoint, call extract_token_usage().
Add provider and model labels to metrics.llm_calls_failures_total, matching the behavior of other Counters.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

  • Assisted-by Claude

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

pytest tests/integration/endpoints/test_rlsapi_v1_integration.py tests/unit/app/endpoints/test_rlsapi_v1.py

Summary by CodeRabbit

  • Improvements
    • Added token-usage tracking for LLM responses to surface consumption patterns.
    • Enhanced failure metrics to include provider and model identifiers for more granular monitoring.
    • Propagated provider/model context through inference failure paths to improve error context and logging.
  • Documentation
    • Clarified helper docstring to describe returned provider/model ordering.
  • Tests
    • Relaxed one metrics assertion to reflect the updated metrics representation.

Need to call extract_token_usage() in order to increment the metrics counter
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

Walkthrough

Derives provider and model from the default model ID, records token-usage from LLM responses, and propagates provider/model into failure telemetry by converting the failure metric to a labeled Counter and passing provider/model through inference failure paths.

Changes

Cohort / File(s) Summary
Metrics
src/metrics/__init__.py
Converted llm_calls_failures_total from an unlabeled Counter to a labeled Counter with labels ["provider","model"].
Inference endpoint & telemetry
src/app/endpoints/rlsapi_v1.py
Added extract_provider_and_model_from_model_id() usage to derive provider/model, invoked extract_token_usage(...) in response handling, and updated _record_inference_failure(...) signature and all calls to include provider and model for labeled failure telemetry.
Docs / Utils
src/utils/query.py
Updated docstring to state return order is (provider, model).
Tests
tests/unit/app/endpoints/test_metrics.py
Removed assertion expecting a specific Prometheus metric line from the metrics endpoint response.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant API as rlsapi_v1
    participant LLM as LLM Provider
    participant Metrics as Prometheus/Telemetry

    Client->>API: POST /infer (request)
    API->>API: resolve default_model_id
    API->>API: extract_provider_and_model_from_model_id(default_model_id)
    API->>LLM: call provider with resolved model
    LLM-->>API: response (including usage)
    API->>API: extract_token_usage(response.usage, default_model_id)
    alt success
        API-->>Client: inference response
    else failure
        API->>Metrics: increment llm_calls_failures_total(provider, model)
        API->>API: _record_inference_failure(provider, model, ...)
        API-->>Client: error response
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • manstis
  • tisnik
  • umago
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Properly increment metrics for /v1/infer' accurately describes the main objective: ensuring metrics are correctly recorded for the /v1/infer endpoint by adding token usage telemetry and provider/model labels to failure metrics.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/rlsapi_v1.py (1)

253-262: ⚠️ Potential issue | 🟡 Minor

Update _record_inference_failure docstring to include new args.

model and provider were added to the function signature but are missing from Args:.

📝 Proposed fix
     Args:
         background_tasks: FastAPI background tasks for async event sending.
         infer_request: The original inference request.
         request: The FastAPI request object.
         request_id: Unique identifier for the request.
         error: The exception that caused the failure.
         start_time: Monotonic clock time when inference started.
+        model: Model identifier used for inference.
+        provider: Provider identifier used for inference.

As per coding guidelines, "All functions must have complete docstrings with brief descriptions" and "Follow Google Python docstring conventions for modules, classes, and functions with Parameters, Returns, Raises, and Attributes sections".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` around lines 253 - 262, The
_record_inference_failure function's docstring is out of date: the Args section
doesn't list the newly added model and provider parameters; update the docstring
for _record_inference_failure to include brief entries for model (e.g., the
model name or identifier) and provider (e.g., the inference provider name),
following the existing Google-style "Args:" formatting used for
background_tasks, infer_request, request, request_id, error, and start_time so
the docstring is complete and consistent with project conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Line 317: The tuple returned by extract_provider_and_model_from_model_id is
being unpacked in reverse; change the assignment at the call site so the first
value is assigned to provider and the second to model (i.e., provider, model =
extract_provider_and_model_from_model_id(model_id)) so downstream labels like
llm_calls_failures_total use the correct provider and model values.

---

Outside diff comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 253-262: The _record_inference_failure function's docstring is out
of date: the Args section doesn't list the newly added model and provider
parameters; update the docstring for _record_inference_failure to include brief
entries for model (e.g., the model name or identifier) and provider (e.g., the
inference provider name), following the existing Google-style "Args:" formatting
used for background_tasks, infer_request, request, request_id, error, and
start_time so the docstring is complete and consistent with project conventions.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a6024a and 338b116.

📒 Files selected for processing (2)
  • src/app/endpoints/rlsapi_v1.py
  • src/metrics/__init__.py

Get the provider and model in order to pass that to _record_inference_failure.
Add model and provider labels to the Counter.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/rlsapi_v1.py (1)

282-282: ⚠️ Potential issue | 🟡 Minor

Pipeline failure: too-many-locals (16/15).

The addition of model and provider variables pushed the local variable count over pylint's limit. Consider extracting some logic to reduce locals.

♻️ One approach: inline model_id extraction or extract a helper

Option 1 - Extract provider/model alongside model_id retrieval:

-def _get_default_model_id() -> str:
+def _get_default_model_config() -> tuple[str, str, str]:
-    """Get the default model ID from configuration.
+    """Get the default model ID, provider, and model from configuration.
 
-    Returns the model identifier in Llama Stack format (provider/model).
+    Returns:
+        Tuple of (model_id, provider, model).

Then in infer_endpoint:

-    model_id = _get_default_model_id()
-    provider, model = extract_provider_and_model_from_model_id(model_id)
+    model_id, provider, model = _get_default_model_config()

Option 2 - Suppress the pylint warning with a pragma if the complexity is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` at line 282, The function infer_endpoint has
exceeded pylint's too-many-locals limit after adding model and provider;
refactor to reduce local variables by extracting logic that derives
model_id/model/provider into a helper (e.g., create a helper function
get_model_and_provider_from_request or inline model_id extraction into the call
site) and update infer_endpoint to call that helper, returning the needed values
so infer_endpoint uses fewer locals; alternatively, if the added locals are
acceptable, add a pylint: disable=too-many-locals pragma on the infer_endpoint
definition (prefer extracting the helper to keep lint rules).
♻️ Duplicate comments (1)
src/app/endpoints/rlsapi_v1.py (1)

317-317: ⚠️ Potential issue | 🔴 Critical

Provider/model destructuring is reversed.

The helper extract_provider_and_model_from_model_id returns (provider, model) per the relevant code snippet, but the assignment stores (model, provider). This will cause llm_calls_failures_total labels to be flipped.

🐛 Proposed fix
-    model, provider = extract_provider_and_model_from_model_id(model_id)
+    provider, model = extract_provider_and_model_from_model_id(model_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` at line 317, The tuple from
extract_provider_and_model_from_model_id is being unpacked in the wrong order;
change the destructuring at the call site so it reads provider, model =
extract_provider_and_model_from_model_id(model_id) (not model, provider = ...),
and ensure subsequent uses (e.g., the labels passed to llm_calls_failures_total)
use these corrected variables so provider and model labels are not flipped.
🧹 Nitpick comments (1)
src/app/endpoints/rlsapi_v1.py (1)

253-265: Docstring missing model and provider parameter descriptions.

The function signature now includes model: str and provider: str, but the Args section doesn't document them. As per coding guidelines, functions should have complete docstrings with Parameters sections.

📝 Proposed fix
     Args:
         background_tasks: FastAPI background tasks for async event sending.
         infer_request: The original inference request.
         request: The FastAPI request object.
         request_id: Unique identifier for the request.
         error: The exception that caused the failure.
         start_time: Monotonic clock time when inference started.
+        model: The model identifier for metrics labeling.
+        provider: The provider identifier for metrics labeling.

     Returns:
         The total inference time in seconds.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` around lines 253 - 265, The docstring for the
inference-failure handler is missing descriptions for the new parameters model
and provider; update the Args section to include both model: str — the model
identifier used for the inference request, and provider: str — the
provider/backend name handling the inference, each with a short purpose and
type, so the docstring documents all parameters (background_tasks,
infer_request, request, request_id, error, start_time, model, provider)
consistently with the existing style.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Line 282: The function infer_endpoint has exceeded pylint's too-many-locals
limit after adding model and provider; refactor to reduce local variables by
extracting logic that derives model_id/model/provider into a helper (e.g.,
create a helper function get_model_and_provider_from_request or inline model_id
extraction into the call site) and update infer_endpoint to call that helper,
returning the needed values so infer_endpoint uses fewer locals; alternatively,
if the added locals are acceptable, add a pylint: disable=too-many-locals pragma
on the infer_endpoint definition (prefer extracting the helper to keep lint
rules).

---

Duplicate comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Line 317: The tuple from extract_provider_and_model_from_model_id is being
unpacked in the wrong order; change the destructuring at the call site so it
reads provider, model = extract_provider_and_model_from_model_id(model_id) (not
model, provider = ...), and ensure subsequent uses (e.g., the labels passed to
llm_calls_failures_total) use these corrected variables so provider and model
labels are not flipped.

---

Nitpick comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 253-265: The docstring for the inference-failure handler is
missing descriptions for the new parameters model and provider; update the Args
section to include both model: str — the model identifier used for the inference
request, and provider: str — the provider/backend name handling the inference,
each with a short purpose and type, so the docstring documents all parameters
(background_tasks, infer_request, request, request_id, error, start_time, model,
provider) consistently with the existing style.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 338b116 and befedd4.

📒 Files selected for processing (3)
  • src/app/endpoints/rlsapi_v1.py
  • src/metrics/__init__.py
  • tests/unit/app/endpoints/test_metrics.py
💤 Files with no reviewable changes (1)
  • tests/unit/app/endpoints/test_metrics.py

@major
Copy link
Contributor

major commented Mar 2, 2026

Thanks, @samdoran

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/rlsapi_v1.py (1)

282-419: ⚠️ Potential issue | 🟠 Major

Address pylint too-many-locals in infer_endpoint (Line 282) before merge.

This is currently a pipeline failure and blocks merge. A minimal fix is to remove one local (e.g., inline inference_time).

Proposed minimal diff
-        inference_time = time.monotonic() - start_time
@@
-        inference_time,
+        time.monotonic() - start_time,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` around lines 282 - 419, The function
infer_endpoint has a pylint too-many-locals issue caused by the local variable
inference_time; remove that local and compute the elapsed time inline where
needed. Specifically, delete the line that assigns inference_time =
time.monotonic() - start_time after retrieve_simple_response, and replace any
use of inference_time (e.g., the argument to _queue_splunk_event) with
time.monotonic() - start_time so the elapsed time is calculated inline; keep
start_time and all calls to _record_inference_failure, retrieve_simple_response
and _queue_splunk_event unchanged otherwise.
🧹 Nitpick comments (1)
src/app/endpoints/rlsapi_v1.py (1)

243-265: Complete _record_inference_failure docstring for new parameters.

model and provider were added to the signature but are missing from the Args section.

As per coding guidelines "All functions must have complete docstrings with brief descriptions."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` around lines 243 - 265, The docstring for
_record_inference_failure is missing descriptions for the newly added parameters
model and provider; update the Args section of the function docstring to include
brief descriptions and types for model (str) and provider (str), e.g., "model:
Name or identifier of the model used for the inference" and "provider: Provider
name (e.g., 'openai', 'azure')", keeping the style consistent with the existing
entries in the docstring.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 282-419: The function infer_endpoint has a pylint too-many-locals
issue caused by the local variable inference_time; remove that local and compute
the elapsed time inline where needed. Specifically, delete the line that assigns
inference_time = time.monotonic() - start_time after retrieve_simple_response,
and replace any use of inference_time (e.g., the argument to
_queue_splunk_event) with time.monotonic() - start_time so the elapsed time is
calculated inline; keep start_time and all calls to _record_inference_failure,
retrieve_simple_response and _queue_splunk_event unchanged otherwise.

---

Nitpick comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 243-265: The docstring for _record_inference_failure is missing
descriptions for the newly added parameters model and provider; update the Args
section of the function docstring to include brief descriptions and types for
model (str) and provider (str), e.g., "model: Name or identifier of the model
used for the inference" and "provider: Provider name (e.g., 'openai', 'azure')",
keeping the style consistent with the existing entries in the docstring.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between befedd4 and e239c46.

📒 Files selected for processing (2)
  • src/app/endpoints/rlsapi_v1.py
  • src/utils/query.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/rlsapi_v1.py (1)

243-265: ⚠️ Potential issue | 🟡 Minor

Incomplete docstring: missing model and provider parameter documentation.

The function signature was extended with model: str and provider: str parameters, but the docstring's Args section doesn't document them.

📝 Proposed fix
     Args:
         background_tasks: FastAPI background tasks for async event sending.
         infer_request: The original inference request.
         request: The FastAPI request object.
         request_id: Unique identifier for the request.
         error: The exception that caused the failure.
         start_time: Monotonic clock time when inference started.
+        model: The model identifier for metrics labeling.
+        provider: The provider identifier for metrics labeling.

     Returns:
         The total inference time in seconds.

As per coding guidelines: "All functions must have complete docstrings with brief descriptions" and "Follow Google Python docstring conventions for modules, classes, and functions with Parameters, Returns, Raises, and Attributes sections".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` around lines 243 - 265, The docstring for
_record_inference_failure is missing documentation for the newly added
parameters model and provider; update the Args section to include brief
Google-style descriptions for model: str (the model name used for inference) and
provider: str (the provider/service used), keeping the same concise phrasing and
format as the other parameters so the function docstring fully documents all
arguments and complies with the project's docstring guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 243-265: The docstring for _record_inference_failure is missing
documentation for the newly added parameters model and provider; update the Args
section to include brief Google-style descriptions for model: str (the model
name used for inference) and provider: str (the provider/service used), keeping
the same concise phrasing and format as the other parameters so the function
docstring fully documents all arguments and complies with the project's
docstring guidelines.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e239c46 and efb62c2.

📒 Files selected for processing (2)
  • src/app/endpoints/rlsapi_v1.py
  • src/utils/query.py

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 5e35cbc into lightspeed-core:main Mar 2, 2026
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants